Author name cluster

Jianping Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

AAAI Conference 2020 Conference Paper

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

Mingyu Ding
Zhe Wang
Bolei Zhou
Jianping Shi
Zhiwu Lu
Ping Luo

A major challenge for video semantic segmentation is the lack of labeled data. In most benchmark datasets, only one frame of a video clip is annotated, which makes most supervised methods fail to utilize information from the rest of the frames. To exploit the spatio-temporal information in videos, many previous works use pre-computed optical ﬂows, which encode the temporal consistency to improve the video segmentation. However, the video segmentation and optical ﬂow estimation are still considered as two separate tasks. In this paper, we propose a novel framework for joint video semantic segmentation and optical ﬂow estimation. Semantic segmentation brings semantic information to handle occlusion for more robust optical ﬂow estimation, while the nonoccluded optical ﬂow provides accurate pixel-level temporal correspondences to guarantee the temporal consistency of the segmentation. Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference. Extensive experiments show that the proposed model makes the video semantic segmentation and optical ﬂow estimation beneﬁt from each other and outperforms existing methods under the same settings in both tasks.

PDF Details

AAAI Conference 2019 Conference Paper

A2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes

Kui Xu
Zhe Wang
Jianping Shi
Hongsheng Li
Qiangfeng Cliff Zhang

Constructing of molecular structural models from Cryo- Electron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies. Methods have evolved from manual construction by structural biologists to perform 6D translation-rotation searching, which is extremely compute-intensive. In this paper, we propose a learning-based method and formulate this problem as a vision-inspired 3D detection and pose estimation task. We develop a deep learning framework for amino acid determination in a 3D Cryo-EM density volume. We also design a sequence-guided Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form the molecular structure. This framework achieves 91% coverage on our newly proposed dataset and takes only a few minutes for a typical structure with a thousand amino acids. Our method is hundreds of times faster and several times more accurate than existing automated solutions without any human intervention.

PDF Details

NeurIPS Conference 2018 Conference Paper

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

Shuyang Sun
Jiangmiao Pang
Jianping Shi
Shuai Yi
Wanli Ouyang

The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e. g. , image-level, region-level, and pixel-level, are diverging. Generally, network structures designed specifically for image classification are directly used as default backbone structure for other tasks including detection and segmentation, but there is seldom backbone structure designed under the consideration of unifying the advantages of networks designed for pixel-level or region-level predicting tasks, which may require very deep features with high resolution. Towards this goal, we design a fish-like network, called FishNet. In FishNet, the information of all resolutions is preserved and refined for the final task. Besides, we observe that existing works still cannot \emph{directly} propagate the gradient information from deep layers to shallow layers. Our design can better handle this problem. Extensive experiments have been conducted to demonstrate the remarkable performance of the FishNet. In particular, on ImageNet-1k, the accuracy of FishNet is able to surpass the performance of DenseNet and ResNet with fewer parameters. FishNet was applied as one of the modules in the winning entry of the COCO Detection 2018 challenge. The code is available at https: //github. com/kevin-ssy/FishNet.

PDF Details

NeurIPS Conference 2018 Conference Paper

Sequential Context Encoding for Duplicate Removal

Lu Qi
Shu Liu
Jianping Shi
Jiaya Jia

Duplicate removal is a critical step to accomplish a reasonable amount of predictions in prevalent proposal-based object detection frameworks. Albeit simple and effective, most previous algorithms utilized a greedy process without making sufficient use of properties of input data. In this work, we design a new two-stage framework to effectively select the appropriate proposal candidate for each object. The first stage suppresses most of easy negative object proposals, while the second stage selects true positives in the reduced proposal set. These two stages share the same network structure, an encoder and a decoder formed as recurrent neural networks (RNN) with global attention and context gate. The encoder scans proposal candidates in a sequential manner to capture the global context information, which is then fed to the decoder to extract optimal proposals. In our extensive experiments, the proposed method outperforms other alternatives by a large margin.

PDF Details

AAAI Conference 2018 Conference Paper

Spatial as Deep: Spatial CNN for Traffic Scene Understanding

Xingang Pan
Jianping Shi
Ping Luo
Xiaogang Wang
Xiaoou Tang

Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored. These relationships are important to learn semantic objects with strong shape priors but weak appearance coherences, such as trafﬁc lanes, which are often occluded or not even painted on the road surface as shown in Fig. 1 (a). In this paper, we propose Spatial CNN (SCNN), which generalizes traditional deep layer-by-layer convolutions to slice-byslice convolutions within feature maps, thus enabling message passings between pixels across rows and columns in a layer. Such SCNN is particular suitable for long continuous shape structure or large objects, with strong spatial relationship but less appearance clues, such as trafﬁc lanes, poles, and wall. We apply SCNN on a newly released very challenging trafﬁc lane detection dataset and Cityscapse dataset1. The results show that SCNN could learn the spatial relationship for structure output and signiﬁcantly improves the performance. We show that SCNN outperforms the recurrent neural network (RNN) based ReNet and MRF+CNN (MRFNet) in the lane detection dataset by 8. 7% and 4. 6% respectively. Moreover, our SCNN won the 1st place on the TuSimple Benchmark Lane Detection Challenge, with an accuracy of 96. 53%.

PDF Details

NeurIPS Conference 2018 Conference Paper

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Tianyi Liu
Shiyang Li
Jianping Shi
Enlu Zhou
Tuo Zhao

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e. g. , training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

PDF Details

IJCAI Conference 2013 Conference Paper

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Jianping Shi
Naiyan Wang
Yang Xia
Dit-Yan Yeung
Irwin King
Jiaya Jia

Matrix factorization (MF) is a popular collaborative ﬁltering approach for recommender systems due to its simplicity and effectiveness. Existing MF methods either assume that all latent features are uncorrelated or assume that all are correlated. To address the important issue of what structure should be imposed on the features, we investigate the covariance matrix of the latent features learned from real data. Based on the ﬁndings, we propose an MF model with a sparse covariance prior which favors a sparse yet non-diagonal covariance matrix. Not only can this reﬂect the semantics more faithfully, but imposing sparsity can also have a side effect of preventing overﬁtting. Starting from a probabilistic generative model with a sparse covariance prior, we formulate the model inference problem as a maximum a posteriori (MAP) estimation problem. The optimization procedure makes use of stochastic gradient descent and majorizationminimization. For empirical validation, we conduct experiments using the MovieLens and Netﬂix datasets to compare the proposed method with two strong baselines which use different priors. Experimental results show that our sparse covariance prior can lead to performance improvement.

PDF Details DOI