Author name cluster

Xiaofeng Ren

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

IJCAI Conference 2022 Conference Paper

SCMT: Self-Correction Mean Teacher for Semi-supervised Object Detection

Feng Xiong
Jiayi Tian
Zhihui Hao
Yulin He
Xiaofeng Ren

Semi-Supervised Object Detection (SSOD) aims to improve performance by leveraging a large amount of unlabeled data. Existing works usually adopt the teacher-student framework to enforce student to learn consistent predictions over the pseudo-labels generated by teacher. However, the performance of the student model is limited since the noise inherently exists in pseudo-labels. In this paper, we investigate the causes and effects of noisy pseudo-labels and propose a simple yet effective approach denoted as Self-Correction Mean Teacher(SCMT) to reduce the adverse effects. Specifically, we propose to dynamically re-weight the unsupervised loss of each student's proposal with additional supervision information from the teacher model, and assign smaller loss weights to possible noisy proposals. Extensive experiments on MS-COCO benchmark have shown the superiority of our proposed SCMT, which can significantly improve the supervised baseline by more than 11% mAP under all 1%, 5% and 10% COCO-standard settings, and surpasses state-of-the-art methods by about 1. 5% mAP. Even under the challenging COCO-additional setting, SCMT still improves the supervised baseline by 4. 9% mAP, and significantly outperforms previous methods by 1. 2% mAP, achieving a new state-of-the-art performance.

PDF Details DOI

ICRA Conference 2013 Conference Paper

RGB-D flow: Dense 3-D motion estimation using color and depth

Evan Herbst
Xiaofeng Ren
Dieter Fox

3-D motion estimation is a fundamental problem that has far-reaching implications in robotics. A scene flow formulation is attractive as it makes no assumptions about scene complexity, object rigidity, or camera motion. RGB-D cameras provide new information useful for computing dense 3-D flow in challenging scenes. In this work we show how to generalize two-frame variational 2-D flow algorithms to 3-D. We show that scene flow can be reliably computed using RGB-D data, overcoming depth noise and outperforming previous results on a variety of scenes. We apply dense 3-D flow to rigid motion segmentation.

Details

ICRA Conference 2012 Conference Paper

Detection-based object labeling in 3D scenes

Kevin Lai 0001
Liefeng Bo
Xiaofeng Ren
Dieter Fox

We propose a view-based approach for labeling objects in 3D scenes reconstructed from RGB-D (color+depth) videos. We utilize sliding window detectors trained from object views to assign class probabilities to pixels in every RGB-D frame. These probabilities are projected into the reconstructed 3D scene and integrated using a voxel representation. We perform efficient inference on a Markov Random Field over the voxels, combining cues from view-based detection and 3D shape, to label the scene. Our detection-based approach produces accurate scene labeling on the RGB-D Scenes Dataset and improves the robustness of object detection.

Details

ICRA Conference 2011 Conference Paper

A large-scale hierarchical multi-view RGB-D object dataset

Kevin Lai 0001
Liefeng Bo
Xiaofeng Ren
Dieter Fox

Over the last decade, the availability of public image repositories and recognition benchmarks has enabled rapid progress in visual object category and instance detection. Today we are witnessing the birth of a new generation of sensing technologies capable of providing high quality synchronized videos of both color and depth, the RGB-D (Kinect-style) camera. With its advanced sensing capabilities and the potential for mass adoption, this technology represents an opportunity to dramatically increase robotic object recognition, manipulation, navigation, and interaction capabilities. In this paper, we introduce a large-scale, hierarchical multi-view object dataset collected using an RGB-D camera. The dataset contains 300 objects organized into 51 categories and has been made publicly available to the research community so as to enable rapid progress based on this promising technology. This paper describes the dataset collection procedure and introduces techniques for RGB-D based object recognition and detection, demonstrating that combining color and depth information substantially improves quality of results.

Details

AAAI Conference 2011 Conference Paper

A Scalable Tree-Based Approach for Joint Object and Pose Recognition

Kevin Lai
Liefeng Bo
Xiaofeng Ren
Dieter Fox

Recognizing possibly thousands of objects is a crucial capability for an autonomous agent to understand and interact with everyday environments. Practical object recognition comes in multiple forms: Is this a coffee mug? (category recognition). Is this Alice’s coffee mug? (instance recognition). Is the mug with the handle facing left or right? (pose recognition). We present a scalable framework, Object-Pose Tree, which efﬁciently organizes data into a semantically structured tree. The tree structure enables both scalable training and testing, allowing us to solve recognition over thousands of object poses in near real-time. Moreover, by simultaneously optimizing all three tasks, our approach outperforms standard nearest neighbor and 1-vs-all classiﬁcations, with large improvements on pose recognition. We evaluate the proposed technique on a dataset of 300 household objects collected using a Kinect-style 3D camera. Experiments demonstrate that our system achieves robust and efﬁcient object category, instance, and pose recognition on challenging everyday objects.

PDF Details

IROS Conference 2011 Conference Paper

Depth kernel descriptors for object recognition

Liefeng Bo
Xiaofeng Ren
Dieter Fox

Consumer depth cameras, such as the Microsoft Kinect, are capable of providing frames of dense depth values at real time. One fundamental question in utilizing depth cameras is how to best extract features from depth frames. Motivated by local descriptors on images, in particular kernel descriptors, we develop a set of kernel features on depth images that model size, 3D shape, and depth edges in a single framework. Through extensive experiments on object recognition, we show that (1) our local features capture different aspects of cues from a depth frame/view that complement one another; (2) our kernel features significantly outperform traditional 3D features (e. g. Spin images); and (3) we significantly improve the capabilities of depth and RGB-D (color+depth) recognition, achieving 10–15% improvement in accuracy over the state of the art.

Details

NeurIPS Conference 2011 Conference Paper

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Liefeng Bo
Xiaofeng Ren
Dieter Fox

Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports).

PDF Details

IROS Conference 2011 Conference Paper

RGB-D object discovery via multi-scene analysis

Evan Herbst
Xiaofeng Ren
Dieter Fox

We introduce an algorithm for object discovery from RGB-D (color plus depth) data, building on recent progress in using RGB-D cameras for 3-D reconstruction. A set of 3-D maps are built from multiple visits to the same scene. We introduce a multi-scene MRF model to detect objects that moved between visits, combining shape, visibility, and color cues. We measure similarities between candidate objects using both 2-D and 3-D matching, and apply spectral clustering to infer object clusters from noisy links. Our approach can robustly detect objects and their motion between scenes even when objects are textureless or have the same shape as other objects.

Details

ICRA Conference 2011 Conference Paper

Sparse distance learning for object recognition combining RGB and depth information

Kevin Lai 0001
Liefeng Bo
Xiaofeng Ren
Dieter Fox

In this work we address joint object category and instance recognition in the context of RGB-D (depth) cameras. Motivated by local distance learning, where a novel view of an object is compared to individual views of previously seen objects, we define a view-to-object distance where a novel view is compared simultaneously to all views of a previous object. This novel distance is based on a weighted combination of feature differences between views. We show, through jointly learning per-view weights, that this measure leads to superior classification performance on object category and instance recognition. More importantly, the proposed distance allows us to find a sparse solution via Group-Lasso regularization, where a small subset of representative views of an object is identified and used, with the rest discarded. This significantly reduces computational cost without compromising recognition accuracy. We evaluate the proposed technique, Instance Distance Learning (IDL), on the RGB-D Object Dataset, which consists of 300 object instances in 51 everyday categories and about 250, 000 views of objects with both RGB color and depth. We empirically compare IDL to several alternative state-of-the-art approaches and also validate the use of visual and shape cues and their combination.

Details

ICRA Conference 2011 Conference Paper

Toward object discovery and modeling via 3-D scene comparison

Evan Herbst
Peter Henry
Xiaofeng Ren
Dieter Fox

The performance of indoor robots that stay in a single environment can be enhanced by gathering detailed knowledge of objects that frequently occur in that environment. We use an inexpensive sensor providing dense color and depth, and fuse information from multiple sensing modalities to detect changes between two 3-D maps. We adapt a recent SLAM technique to align maps. A probabilistic model of sensor readings lets us reason about movement of surfaces. Our method handles arbitrary shapes and motions, and is robust to lack of texture. We demonstrate the ability to find whole objects in complex scenes by regularizing over surface patches.

Details

NeurIPS Conference 2010 Conference Paper

Kernel Descriptors for Visual Recognition

Liefeng Bo
Xiaofeng Ren
Dieter Fox

The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT~\cite{Lowe2004Distinctive} and HOG~\cite{Dalal2005Histograms}, are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, \etc) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA)~\cite{Scholkopf1998Nonlinear}. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.

PDF Details

NeurIPS Conference 2005 Conference Paper

Cue Integration for Figure/Ground Labeling

Xiaofeng Ren
Jitendra Malik
Charless Fowlkes

We present a model of edge and region grouping using a conditional random field built over a scale-invariant representation of images to integrate multiple cues. Our model includes potentials that capture low-level similarity, mid-level curvilinear continuity and high-level object shape. Maximum likelihood parameters for the model are learned from human labeled groundtruth on a large collection of horse images using belief propagation. Using held out test data, we quantify the information gained by incorporating generic mid-level cues and high-level shape.

PDF Details