Arrow Research search

Author name cluster

Xiaobai Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
1 author row

Possible papers

9

TIST Journal 2018 Journal Article

High-Precision Camera Localization in Scenes with Repetitive Patterns

  • Xiaobai Liu
  • Qian Xu
  • Yadong Mu
  • Jiadi Yang
  • Liang Lin
  • Shuicheng Yan

This article presents a high-precision multi-modal approach for localizing moving cameras with monocular videos, which has wide potentials in many intelligent applications, including robotics, autonomous vehicles, and so on. Existing visual odometry methods often suffer from symmetric or repetitive scene patterns, e.g., windows on buildings or parking stalls. To address this issue, we introduce a robust camera localization method that contributes in two aspects. First, we formulate feature tracking, the critical step of visual odometry, as a hierarchical min-cost network flow optimization task, and we regularize the formula with flow constraints, cross-scale consistencies, and motion heuristics. The proposed regularized formula is capable of adaptively selecting distinctive features or feature combinations, which is more effective than traditional methods that detect and group repetitive patterns in a separate step. Second, we develop a joint formula for integrating dense visual odometry and sparse GPS readings in a common reference coordinate. The fusion process is guided with high-order statistics knowledge to suppress the impacts of noises, clusters, and model drifting. We evaluate the proposed camera localization method on both public video datasets and a newly created dataset that includes scenes full of repetitive patterns. Results with comparisons show that our method can achieve comparable performance to state-of-the-art methods and is particularly effective for addressing repetitive pattern issues.

AAAI Conference 2018 Conference Paper

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

  • Hao-Shu Fang
  • Yuanlu Xu
  • Wenguan Wang
  • Xiaobai Liu
  • Song-Chun Zhu

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i. e. , kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

IJCAI Conference 2018 Conference Paper

Unsupervised Learning based Jump-Diffusion Process for Object Tracking in Video Surveillance

  • Xiaobai Liu
  • Donovan Lo
  • Chau Thuan

This paper presents a principled way for dealing with occlusions in visual tracking which is a long-standing issue in computer vision but largely remains unsolved. As the major innovation, we develop a learning-based jump-diffusion process to jointly track object locations and estimate their visibility statuses over time. Our method employs in particular a set of jump dynamics to change object's visibility statuses and a set of diffusion dynamics to track objects in videos. Different from the traditional jump-diffusion process that stochastically generates dynamics, we utilize deep policy functions to determine the best dynamic at the present step and learn the optimal policies from raw videos using reinforcement learning methods. Our method is capable of tracking objects with severe occlusions in crowded scenes and thus recovers the complete trajectories of objects that undergo multiple interactions with others. We evaluate the proposed method on challenging video sequences and compare it to alternative methods. Significant improvements are obtained particularly for the videos including frequent interactions or occlusions.

AAAI Conference 2017 Conference Paper

Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing

  • Yuanlu Xu
  • Xiaobai Liu
  • Lei Qin
  • Song-Chun Zhu

In this paper, we propose a Spatio-temporal Attributed Parse Graph (ST-APG) to integrate semantic attributes with trajectories for cross-view people tracking. Given videos from multiple cameras with overlapping field of view (FOV), our goal is to parse the videos and organize the trajectories of all targets into a scene-centered representation. We leverage rich semantic attributes of human, e. g. , facing directions, postures and actions, to enhance cross-view tracklet associations, besides frequently used appearance and geometry features in the literature. In particular, the facing direction of a human in 3D, once detected, often coincides with his/her moving direction or trajectory. Similarly, the actions of humans, once recognized, provide strong cues for distinguishing one subject from the others. The inference is solved by iteratively grouping tracklets with cluster sampling and estimating people semantic attributes by dynamic programming. In experiments, we validate our method on one public dataset and create another new dataset that records people’s daily life in public, e. g. , food court, office reception and plaza, each of which includes 3-4 cameras. We evaluate the proposed method on these challenging videos and achieve promising multi-view tracking results.

IJCAI Conference 2017 Conference Paper

Single-Image 3D Scene Parsing Using Geometric Commonsense

  • Chengcheng Yu
  • Xiaobai Liu
  • Song-Chun Zhu

This paper presents a unified grammatical framework capable of reconstructing a variety of scene types (e. g. , urban, campus, county etc. ) from a single input image. The key idea of our approach is to study a novel commonsense reasoning framework that mainly exploits two types of prior knowledges: (i) prior distributions over a single dimension of objects, e. g. , that the length of a sedan is about 4. 5 meters; (ii) pair-wise relationships between the dimensions of scene entities, e. g. , that the length of a sedan is shorter than a bus. These unary or relative geometric knowledge, once extracted, are fairly stable across different types of natural scenes, and are informative for enhancing the understanding of various scenes in both 2D images and 3D world. Methodologically, we propose to construct a hierarchical graph representation as a unified representation of the input image and related geometric knowledge. We formulate these objectives with a unified probabilistic formula and develop a data-driven Monte Carlo method to infer the optimal solution with both bottom-to-up and top-down computations. Results with comparisons on public datasets showed that our method clearly outperforms the alternative methods.

IJCAI Conference 2016 Conference Paper

A Stochastic Image Grammar for Fine-Grained 3D Scene Reconstruction

  • Xiaobai Liu
  • Yadong Mu
  • Liang Lin

This paper presents a stochastic grammar for fine-grained 3D scene reconstruction from a single image. At the heart of our approach is a small number of grammar rules that can describe the most common geometric structures, e. g. , two straights lines being co-linear or orthogonal, or that a line lying on a planar region etc. With these grammar rules, we re-frame single-view 3D reconstruction problem as jointly solving two coupled sub-tasks: i) segmenting of image entities, e. g. planar regions, straight edge segments, and ii) optimizing pixel-wise 3D scene model through the application of grammar rules over image entities. To reconstruct a new image, we design an efficient hybrid Monte Carlo (HMC) algorithm to simulate Markov Chain walking towards a posterior distribution. Our algorithm utilizes two iterative dynamics: i) Hamiltonian Dynamics that makes proposals along the gradient direction to search the continuous pixel-wise 3D scene model; and ii) Cluster Dynamics, that flip the colors of clusters of pixels to form planar region partition. Following the Metropolis-hasting principle, these dynamics not only make distant proposals but also guarantee detail-balance and fast convergence. Results with comparisons on public image dataset show that our method clearly outperforms the alternate state-of-the-art single-view reconstruction methods.

IJCAI Conference 2016 Conference Paper

Geometric Scene Parsing with Hierarchical LSTM

  • zhanglin peng
  • Ruimao Zhang
  • Xiaodan Liang
  • Xiaobai Liu
  • Liang Lin

This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions. This problem is more challenging than the traditional semantic scene labeling, as recovering geometric structures necessarily requires the rich and diverse contextual information. To achieve these goals, we propose a novel recurrent neural network model, named Hierarchical Long Short-Term Memory (H-LSTM). It contains two coupled sub-networks: the Pixel LSTM (P-LSTM) and the Multi-scale Super-pixel LSTM (MS-LSTM) for handling the surface labeling and relation prediction, respectively. The two sub-networks provide complementary information to each other to exploit hierarchical scene contexts, and they are jointly optimized for boosting the performance. Our extensive experiments show that our model is capable of parsing scene geometric structures and outperforming several state-of-the-art methods by large margins. In addition, we show promising 3D reconstruction results from the still images based on the geometric parsing.

IJCAI Conference 2016 Conference Paper

Learning Compact Visual Representation with Canonical Views for Robust Mobile Landmark Search

  • Lei Zhu
  • Jialie Shen
  • Xiaobai Liu
  • Liang Xie
  • Liqiang Nie

Mobile Landmark Search (MLS) recently receives increasing attention. However, it still remains unsolved due to two important issues. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images. This paper proposes a Canonical View based Compact Visual Representation (2CVR) to handle these problems via novel three stage learning. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multimodal sparse coding is applied to transform multiple visual features into an intermediate representation which can robustly characterize visual contents of varied landmark images with only fixed canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored binary embedding model which preserves visual relations of images measured with canonical views and removes noises. With 2CVR, robust visual query processing, low cost of query transmission, and fast search process are simultaneously supported. Experiments demonstrate the superior performance of 2CVR over several state-of-the-art methods.

AAAI Conference 2016 Conference Paper

Multi-View 3D Human Tracking in Crowded Scenes

  • Xiaobai Liu

This paper presents a robust multi-view method for tracking people in crowded 3D scene. Our method distinguishes itself from previous works in two aspects. Firstly, we define a set of binary spatial relationships for individual subjects or pairs of subjects that appear at the same time, e. g. being left or right, being closer or further to the camera, etc. These binary relationships directly reflect relative positions of subjects in 3D scene and thus should be persisted during inference. Secondly, we introduce an unified probabilistic framework to exploit binary spatial constraints for simultaneous 3D localization and cross-view human tracking. We develop a cluster Markov Chain Monte Carlo method to search the optimal solution. We evaluate our method on both public video benchmarks and newly built multi-view video dataset. Results with comparisons showed that our method could achieve state-of-the-art tracking results and meter-level 3D localization on challenging videos.