Author name cluster

Yian Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

AAAI Conference 2026 Short Paper

APEX-Q: Arbitrary-dimension Product-EXtension Quantization for Accelerated LLM Deployment (Student Abstract)

Yian Wang
Ye Qiao
Sitao Huang
Hyoukjun Kwon

We present APEX-Q, a flexible product quantization framework for compressing large language models. Unlike prior multi-codebook quantization methods with fixed partitions, APEX-Q supports arbitrary-dimensional tensor quantization, better capturing weight redundancy. It achieves performance on par with 4-bit and 8-bit baselines, enables post-training quantization without retraining, and reveals key trade-offs across subvector dimensions, codebook sizes, and hardware efficiency. APEX-Q thus provides a unified, hardware-friendly approach to scalable LLM deployment.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills

Chunru Lin
Haotian Yuan
Yian Wang
Xiaowen Qiu
Tsun-Hsuan Johnson Wang
Minghao Guo
Bohan Wang
Yashraj Narang

Endowing robots with tool design abilities is critical for enabling them to solve complex manipulation tasks that would otherwise be intractable. While recent generative frameworks can automatically synthesize task settings—such as 3D scenes and reward functions—they have not yet addressed the challenge of tool-use scenarios. Simply retrieving human-designed tools might not be ideal since many tools (e. g. , a rolling pin) are difficult for robotic manipulators to handle. Furthermore, existing tool design approaches either rely on predefined templates with limited parameter tuning or apply generic 3D generation methods that are not optimized for tool creation. To address these limitations, we propose RobotSmith, an automated pipeline that leverages the implicit physical knowledge embedded in vision-language models (VLMs) alongside the more accurate physics provided by physics simulations to design and use tools for robotic manipulation. Our system (1) iteratively proposes tool designs using collaborative VLM agents, (2) generates low-level robot trajectories for tool use, and (3) jointly optimizes tool geometry and usage for task performance. We evaluate our approach across a wide range of manipulation tasks involving rigid, deformable, and fluid objects. Experiments show that our method consistently outperforms strong baselines in both task success rate and overall performance. Notably, our approach achieves a 50. 0\% average success rate, significantly surpassing other baselines such as 3D generation (21. 4\%) and tool retrieval (11. 1\%). Finally, we deploy our system in real-world settings, demonstrating that the generated tools and their usage plans transfer effectively to physical execution, validating the practicality and generalization capabilities of our approach.

PDF Details

NeurIPS Conference 2024 Conference Paper

Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting

Yian Wang
Xiaowen Qiu
Jiageng Liu
Zhehuan Chen
Jiting Cai
Yufei Wang
Tsun-Hsuan Wang
Zhou Xian

Creating large-scale interactive 3D environments is essential for the development of Robotics and Embodied AI research. However, generating diverse embodied environments with realistic detail and considerable complexity remains a significant challenge. Current methods, including manual design, procedural generation, diffusion-based scene generation, and large language model (LLM) guided scene design, are hindered by limitations such as excessive human effort, reliance on predefined rules or training datasets, and limited 3D spatial reasoning ability. Since pre-trained 2D image generative models better capture scene and object configuration than LLMs, we address these challenges by introducing $\textit{Architect}$, a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. In detail, we utilize foundation visual perception models to obtain each generated object from the image and leverage pre-trained depth estimation models to lift the generated 2D image to 3D space. While there are still challenges that the camera parameters and scale of depth are still absent in the generated image, we address those problems by ''controlling'' the diffusion model by $\textit{hierarchical inpainting}$. Specifically, having access to ground-truth depth and camera parameters in simulation, we first render a photo-realistic image of only the background. Then, we inpaint the foreground in this image, passing the geometric cues to the inpainting model in the background, which informs the camera parameters. This process effectively controls the camera parameters and depth scale for the generated image, facilitating the back-projection from 2D image to 3D point clouds. Our pipeline is further extended to a hierarchical and iterative inpainting process to continuously generate the placement of large furniture and small objects to enrich the scene. This iterative structure brings the flexibility for our method to generate or refine scenes from various starting points, such as text, floor plans, or pre-arranged environments. Experimental results demonstrate that $\textit{Architect}$ outperforms existing methods in producing realistic and complex environments, making it highly suitable for Embodied AI and robotics applications.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Articulated Object Manipulation with Coarse-to-fine Affordance for Mitigating the Effect of Point Cloud Noise

Suhan Ling
Yian Wang
Ruihai Wu
Shiguang Wu 0004
Yuzheng Zhuang
Tianyi Xu
Yu Li 0022
Chang Liu 0077

3D articulated objects are inherently challenging for manipulation due to the varied geometries and intricate functionalities associated with articulated objects. Point-level affordance, which predicts the per-point actionable score and thus proposes the best point to interact with, has demonstrated excellent performance and generalization capabilities in articulated object manipulation. However, a significant challenge remains: while previous works use perfect point cloud generated in simulation, the models cannot directly apply to the noisy point cloud in the real-world. To tackle this challenge, we leverage the property of real-world scanned point cloud that, the point cloud becomes less noisy when the camera is closer to the object. Therefore, we propose a novel coarse-to-fine affordance learning pipeline to mitigate the effect of point cloud noise in two stages. In the first stage, we learn the affordance on the noisy far point cloud which includes the whole object to propose the approximated place to manipulate. Then, we move the camera in front of the approximated place, scan a less noisy point cloud containing precise local geometries for manipulation, and learn affordance on such point cloud to propose fine-grained final actions. The proposed method is thoroughly evaluated both using large-scale simulated noisy point clouds mimicking real-world scans, and in the real world scenarios, with superiority over existing methods, demonstrating the effectiveness in tackling the noisy real-world point cloud problem.

Details

ICRA Conference 2024 Conference Paper

Distributionally Robust Chance Constrained Trajectory Optimization for Mobile Robots within Uncertain Safe Corridor

Shaohang Xu
Haolin Ruan
Wentao Zhang 0010
Yian Wang
Lijun Zhu 0001
Chin Pang Ho

Safe corridor-based Trajectory Optimization (TO) presents an appealing approach for collision-free path planning of autonomous robots, because its convex formulation can guarantee global optimality. The safe corridor is constructed based on the obstacle map, however, the non-ideal perception induces uncertainty, which is rarely considered in the context of trajectory generation. In this paper, we propose Distributionally Robust Safe Corridor Constraints (DRSCCs) to consider the uncertainty of the safe corridor. Then, we integrate DRSCCs into the trajectory optimization framework using Bernstein basis polynomials. Theoretically, we rigorously prove that the proposed trajectory optimization problem is equivalent to a convex quadratic program, which is computationally efficient to deploy onto real robots. The simulation results show that our method enhances navigation safety by significantly reducing the infeasible motions compared to the baseline. Moreover, the proposed approach is validated through two robotic applications, a micro Unmanned Aerial Vehicle (UAV) and a quadruped robot Unitree A1.

Details

ICRA Conference 2024 Conference Paper

Observer-based Distributed MPC for Collaborative Quadrotor-Quadruped Manipulation of a Cable-Towed Load

Shaohang Xu
Yian Wang
Wentao Zhang 0010
Chin Pang Ho
Lijun Zhu 0001

This paper presents a collaborative quadrotor-quadruped robot system for the manipulation of a cable-towed payload. In particular, we aim to solve the challenge from the unknown dynamics of the cable-towed payload. To this end, we first propose novel dynamic models for both the quadrotor and the quadruped robot, taking into account the nonlinear robot dynamics and the uncertainties associated with the cable-towed load. Moreover, we design observers for the hybrid interaction between the robots and the payload. Theoretically, the convergence of these observers is analyzed using Lyapunov functions under mild technical assumptions. Finally, we seamlessly integrate the dynamics models and the observers into a distributed Model Predictive Control (MPC) framework with kinematics limitations and collision avoidance constraints. The proposed system is validated through challenging field experiments in indoor and outdoor environments, involving push disturbances, varying and unknown payloads, uneven terrains, etc.

Details

ICLR Conference 2022 Conference Paper

VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects

Ruihai Wu
Yan Shen 0035
Kaichun Mo
Zizheng Guo
Yian Wang
Tianhao Wu 0001
Qingnan Fan
Xuelin Chen

Perceiving and manipulating 3D articulated objects (e.g., cabinets, doors) in human environments is an important yet challenging task for future home-assistant robots. The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. In this paper, we propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction-aware, and task-aware visual action affordance and trajectory proposals. We design an interaction-for-perception framework VAT-Mart to learn such actionable visual representations by simultaneously training a curiosity-driven reinforcement learning policy exploring diverse interaction trajectories and a perception module summarizing and generalizing the explored knowledge for pointwise predictions among diverse shapes. Experiments prove the effectiveness of the proposed approach using the large-scale PartNet-Mobility dataset in SAPIEN environment and show promising generalization capabilities to novel test shapes, unseen object categories, and real-world data.

Details