Author name cluster

Baoquan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

Spatial-Spectral Homogeneous Attacks on Physical-World Large Vision-Language Models

Daizong Liu
Baoquan Chen
Wei Hu

Although large vision-language models (LVLMs) have demonstrated promising versatile capabilities on various downstream tasks, they are shown to be susceptible to adversarial examples. Existing LVLM attackers simply implement adversarial patterns in an impracticable setting: i) add digital global perturbations to entire input image; ii) access prior knowledge of LVLMs for optimization; iii) do not consider realistic transformations. These make them difficult to deploy in the physical-world attack scenarios. Motivated by the research gap and counter-practice phenomenon, this paper proposes the first practical LVLM attack method based on a novel adversarial patch design, which can achieve physical and digital attack settings without using any LVLM details. In particular, we introduce adversarial homogeneous constraints in both spatial and spectral domains to improve the patch stealthy for resisting potential real-world defenses. Besides, we also develop a new technique for synthesizing reasonably realistic transformations that capture the expected patch appearance variations in daily life. Extensive experiments are conducted to verify the strong adversarial capabilities of our proposed attack against prevalent LVLMs spanning a spectrum of tasks.

PDF Details DOI

ICLR Conference 2023 Conference Paper

A Laplace-inspired Distribution on SO(3) for Probabilistic Rotation Estimation

Yingda Yin
Yang Wang
He Wang 0010
Baoquan Chen

Estimating the 3DoF rotation from a single RGB image is an important yet challenging problem. Probabilistic rotation regression has raised more and more attention with the benefit of expressing uncertainty information along with the prediction. Though modeling noise using Gaussian-resembling Bingham distribution and matrix Fisher distribution is natural, they are shown to be sensitive to outliers for the nature of quadratic punishment to deviations. In this paper, we draw inspiration from multivariate Laplace distribution and propose a novel Rotation Laplace distribution on SO(3). Rotation Laplace distribution is robust to the disturbance of outliers and enforces much gradient to the low-error region, resulting in a better convergence. Our extensive experiments show that our proposed distribution achieves state-of-the-art performance for rotation regression tasks over both probabilistic and non-probabilistic baselines. Our project page is at pku-epic.github.io/RotationLaplace.

Details

ICML Conference 2021 Conference Paper

Unsupervised Co-part Segmentation through Assembly

Qingzhe Gao
Bin Wang 0021
Libin Liu 0002
Baoquan Chen

Co-part segmentation is an important problem in computer vision for its rich applications. We propose an unsupervised learning approach for co-part segmentation from images. For the training stage, we leverage motion information embedded in videos and explicitly extract latent representations to segment meaningful object parts. More importantly, we introduce a dual procedure of part-assembly to form a closed loop with part-segmentation, enabling an effective self-supervision. We demonstrate the effectiveness of our approach with a host of extensive experiments, ranging from human bodies, hands, quadruped, and robot arms. We show that our approach can achieve meaningful and compact part segmentation, outperforming state-of-the-art approaches on diverse benchmarks.

Details

AAAI Conference 2020 Conference Paper

AutoRemover: Automatic Object Removal for Autonomous Driving Videos

Rong Zhang
Wei Li
Peng Wang
Chenye Guan
Jin Fang
Yuhang Song
Jinhui Yu
Baoquan Chen

Motivated by the need for photo-realistic simulation in autonomous driving, in this paper we present a video inpainting algorithm AutoRemover, designed speciﬁcally for generating street-view videos without any moving objects. In our setup we have two challenges: the ﬁrst is the shadow, shadows are usually unlabeled but tightly coupled with the moving objects. The second is the large ego-motion in the videos. To deal with shadows, we build up an autonomous driving shadow dataset and design a deep neural network to detect shadows automatically. To deal with large ego-motion, we take advantage of the multi-source data, in particular the 3D data, in autonomous driving. More speciﬁcally, the geometric relationship between frames is incorporated into an inpainting deep neural network to produce high-quality structurally consistent video output. Experiments show that our method outperforms other state-of-the-art (SOTA) object removal algorithms, reducing the RMSE by over 19%.

PDF Details

NeurIPS Conference 2020 Conference Paper

Generative 3D Part Assembly via Dynamic Graph Learning

Jialei Huang
Guanqi Zhan
Qingnan Fan
Kaichun Mo
Lin Shao
Baoquan Chen
Leonidas J. Guibas
Hao Dong

Autonomous part assembly is a challenging yet crucial task in 3D computer vision and robotics. Analogous to buying an IKEA furniture, given a set of 3D parts that can assemble a single shape, an intelligent agent needs to perceive the 3D part geometry, reason to propose pose estimations for the input parts, and finally call robotic planning and control routines for actuation. In this paper, we focus on the pose estimation subproblem from the vision side involving geometric and relational reasoning over the input part geometry. Essentially, the task of generative 3D part assembly is to predict a 6-DoF part pose, including a rigid rotation and translation, for each input part that assembles a single 3D shape as the final output. To tackle this problem, we propose an assembly-oriented dynamic graph learning framework that leverages an iterative graph neural network as a backbone. It explicitly conducts sequential part assembly refinements in a coarse-to-fine manner, exploits a pair of part relation reasoning module and part aggregation module for dynamically adjusting both part features and their relations in the part graph. We conduct extensive experiments and quantitative comparisons to three strong baseline methods, demonstrating the effectiveness of the proposed approach.

PDF Details

ICLR Conference 2020 Conference Paper

Unpaired Point Cloud Completion on Real Scans using Adversarial Training

Xuelin Chen
Baoquan Chen
Niloy J. Mitra

As 3D scanning solutions become increasingly popular, several deep learning setups have been developed for the task of scan completion, i.e., plausibly filling in regions that were missed in the raw scans. These methods, however, largely rely on supervision in the form of paired training data, i.e., partial scans with corresponding desired completed scans. While these methods have been successfully demonstrated on synthetic data, the approaches cannot be directly used on real scans in absence of suitable paired training data. We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion. We evaluate the approach qualitatively on several real-world datasets (ScanNet, Matterport3D, KITTI), quantitatively on 3D-EPN shape completion benchmark dataset, and demonstrate realistic completions under varying levels of incompleteness.

Details

ICRA Conference 2018 Conference Paper

Caging Loops in Shape Embedding Space: Theory and Computation

Jian Liu
Shiqing Xin
Zengfu Gao
Kai Xu 0004
Changhe Tu
Baoquan Chen

We propose to synthesize feasible caging grasps for a target object through computing Caging Loops, a closed curve defined in the shape embedding space of the object. Different from the traditional methods, our approach decouples caging loops from the surface geometry of target objects through working in the embedding space. This enables us to synthesize caging loops encompassing multiple topological holes, instead of always tied with one specific handle which could be too small to be graspable by the robot gripper. Our method extracts caging loops through a topological analysis of the distance field defined for the target surface in the embedding space, based on a rigorous theoretical study on the relation between caging loops and the field topology. Due to the decoupling, our method can tolerate incomplete and noisy surface geometry of an unknown target object captured on-the-fly. We implemented our method with a robotic gripper and demonstrate through extensive experiments that our method can synthesize reliable grasps for objects with complex surface geometry and topology and in various scales.

Details

NeurIPS Conference 2018 Conference Paper

DifNet: Semantic Segmentation by Diffusion Networks

Peng Jiang
Fanglin Gu
Yunhai Wang
Changhe Tu
Baoquan Chen

Deep Neural Networks (DNNs) have recently shown state of the art performance on semantic segmentation tasks, however, they still suffer from problems of poor boundary localization and spatial fragmented predictions. The difficulties lie in the requirement of making dense predictions from a long path model all at once since details are hard to keep when data goes through deeper layers. Instead, in this work, we decompose this difficult task into two relative simple sub-tasks: seed detection which is required to predict initial predictions without the need of wholeness and preciseness, and similarity estimation which measures the possibility of any two nodes belong to the same class without the need of knowing which class they are. We use one branch network for one sub-task each, and apply a cascade of random walks base on hierarchical semantics to approximate a complex diffusion process which propagates seed information to the whole image according to the estimated similarities. The proposed DifNet consistently produces improvements over the baseline models with the same depth and with the equivalent number of parameters, and also achieves promising performance on Pascal VOC and Pascal Context dataset. OurDifNet is trained end-to-end without complex loss functions.

PDF Details

NeurIPS Conference 2018 Conference Paper

PointCNN: Convolution On X-Transformed Points

Yangyan Li
Rui Bu
Mingchao Sun
Wei Wu
Xinhan Di
Baoquan Chen

We present a simple and general framework for feature learning from point cloud. The key to the success of CNNs is the convolution operator that is capable of leveraging spatially-local correlation in data represented densely in grids (e. g. images). However, point cloud are irregular and unordered, thus a direct convolving of kernels against the features associated with the points will result in deserting the shape information while being variant to the orders. To address these problems, we propose to learn a X-transformation from the input points, which is used for simultaneously weighting the input features associated with the points and permuting them into latent potentially canonical order. Then element-wise product and sum operations of typical convolution operator are applied on the X-transformed features. The proposed method is a generalization of typical CNNs into learning features from point cloud, thus we call it PointCNN. Experiments show that PointCNN achieves on par or better performance than state-of-the-art methods on multiple challenging benchmark datasets and tasks.

PDF Details

ECAI Conference 2016 Conference Paper

ShapeLearner: Towards Shape-Based Visual Knowledge Harvesting

Huayong Xu
Yafang Wang
Kang Feng
Gerard de Melo
Wei Wu
Andrei Sharf
Baoquan Chen

The deluge of images on the Web has led to a number of efforts to organize images semantically and mine visual knowledge. Despite enormous progress on categorizing entire images or bounding boxes, only few studies have targeted fine-grained image understanding at the level of specific shape contours. For instance, beyond recognizing that an image portrays a cat, we may wish to distinguish its legs, head, tail, and so on. To this end, we present ShapeLearner, a system that acquires such visual knowledge about object shapes and their parts in a semantic taxonomy, and then is able to exploit this hierarchy in order to analyze new kinds of objects that it has not observed before. ShapeLearner jointly learns this knowledge from sets of segmented images. The space of label and segmentation hypotheses is pruned and then evaluated using Integer Linear Programming. Experiments on a variety of shape classes show the accuracy and effectiveness of our method.

Details