Author name cluster

Mingyu Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers

2 author rows

AAAI Conference 2026 Conference Paper

ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction

Pengze Li
Jiaqi Liu
Junchi Yu
Lihao Liu
Mingyu Ding
Wanli Ouyang
Shixiang Tang
Xi Chen

Large language models (LLMs) are increasingly used in scientific domains. While they can produce reasoning-like content via methods such as chain-of-thought prompting, these outputs are typically unstructured and informal, obscuring whether models truly understand the fundamental reasoning paradigms that underpin scientific inference. To address this, we introduce a novel task named Latent Reasoning Chain Extraction (ARCHE), in which models must decompose complex reasoning arguments into combinations of standard reasoning paradigms in the form of a Reasoning Logic Tree (RLT). In an RLT, all reasoning steps are explicitly categorized as one of three variants of Peirce’s fundamental inference modes: deduction, induction, or abduction. To facilitate this task, we release ARCHE Bench, a new benchmark derived from 70 Nature Communications articles, including more than 1,900 references and 38,000 viewpoints. We propose two logic-aware evaluation metrics: Entity Coverage (EC) for content completeness and Reasoning Edge Accuracy (REA) for step-by-step logical validity. Evaluations on 10 leading LLMs on ARCHE Bench reveal that models exhibit a trade-off between REA and EC, and none are yet able to extract a complete and standard reasoning chain. These findings highlight a substantial gap between the abilities of current reasoning models and the rigor required for scientific argumentation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Unlocking the Power of Large Multimodal Models for Robot Learning: Robustness, Generalization, and Opportunities

Mingyu Ding

Large multimodal models (LMMs) have revolutionized AI by demonstrating remarkable capabilities in vision, language, audio, and other domains, particularly in understanding and generalization tasks. Yet, moving beyond passive understanding to active interaction requires embodied agents, such as robots, that can harness the capabilities of AI models to act within the physical world. My core research aims to build embodied agents that reason about and interact with the physical world with human-like commonsense. Specifically, I design algorithms and representations that enable robots to perceive their environment, reason about physical properties, and plan long-horizon actions for both manipulation and locomotion. These advances are grounded in the integration of large-scale AI models with embodied control. I organize this agenda into three stages: (1) injecting actions into LMMs to form vision–language–action (VLA) models; (2) learning from human motion and contact to enrich physical reasoning; and (3) advancing whole-body robot loco-manipulation guided by LMMs toward embodied artificial general intelligence (AGI). The talk details recent advances in leveraging LMMs for robot learning, emphasizing the promise of robust generalization across diverse environments, tasks, and modalities. I will highlight contributions at the intersection of perception, reasoning, and control, and outline open challenges and future opportunities toward enabling humanoid robots that can robustly understand, interact, and collaborate with humans in complex real-world settings.

PDF Details DOI

ICRA Conference 2025 Conference Paper

Embodiment-agnostic Action Planning via Object-Part Scene Flow

Weiliang Tang
Jia-Hui Pan
Wei Zhan
Jianshu Zhou
Huaxiu Yao
Yun-Hui Liu 0001
Masayoshi Tomizuka
Mingyu Ding

Observing that the key for robotic action planning is to understand the target-object motion when its associated part is manipulated by the end effector, we propose to generate the 3D object-part scene flow and extract its transformations to solve the action trajectories for diverse embodiments. The advantage of our approach is that it derives the robot action explicitly from object motion prediction, yielding a more robust policy by understanding the object motions. Also, beyond policies trained on embodiment-centric data, our method is embodiment-agnostic, generalizable across diverse embodiments, and being able to learn from human demonstrations. Our method comprises three components: an object-part predictor to locate the part for the end effector to manipulate, an RGBD video generator to predict future RGBD videos, and a trajectory planner to extract embodiment-agnostic transformation sequences and solve the trajectory for diverse embodiments. Trained on videos even without trajectory data, our method still outperforms existing works significantly by 27. 7% and 26. 2% on the prevailing virtual environments MetaWorld and Franka-Kitchen, respectively. Furthermore, we conducted real-world experiments, showing that our policy, trained only with human demonstration, can be deployed to various embodiments.