Author name cluster

Yangqing Jia

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

ICML Conference 2014 Conference Paper

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

Jeff Donahue
Yangqing Jia
Oriol Vinyals
Judy Hoffman
Ning Zhang 0014
Eric Tzeng
Trevor Darrell

We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a variety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed feature, and report novel results that significantly outperform the state-of-the-art on several important vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.

Details

IROS Conference 2013 Conference Paper

Grounding spatial relations for human-robot interaction

Sergio Guadarrama
Lorenzo Riano
Dave Golland
Daniel Goehring
Yangqing Jia
Dan Klein 0001
Pieter Abbeel
Trevor Darrell

We propose a system for human-robot interaction that learns both models for spatial prepositions and for object recognition. Our system grounds the meaning of an input sentence in terms of visual percepts coming from the robot's sensors in order to send an appropriate command to the PR2 or respond to spatial queries. To perform this grounding, the system recognizes the objects in the scene, determines which spatial relations hold between those objects, and semantically parses the input sentence. The proposed system uses the visual and spatial information in conjunction with the semantic parse to interpret statements that refer to objects (nouns), their spatial relationships (prepositions), and to execute commands (actions). The semantic parse is inherently compositional, allowing the robot to understand complex commands that refer to multiple objects and relations such as: “Move the cup close to the robot to the area in front of the plate and behind the tea box”. Our system correctly parses 94% of the 210 online test sentences, correctly interprets 91% of the correctly parsed sentences, and correctly executes 89% of the correctly interpreted sentences.

Details

ICML Conference 2013 Conference Paper

On Compact Codes for Spatially Pooled Features

Yangqing Jia
Oriol Vinyals
Trevor Darrell

Feature encoding with an overcomplete dictionary has demonstrated good performance in many applications, especially computer vision. In this paper we analyze the classification accuracy with respect to dictionary size by linking the encoding stage to kernel methods and \nystrom sampling, and obtain useful bounds on accuracy as a function of size. The \nystrom method also inspires us to revisit dictionary learning from local patches, and we propose to learn the dictionary in an end-to-end fashion taking into account pooling, a common computational layer in vision. We validate our contribution by showing how the derived bounds are able to explain the observed behavior of multiple datasets, and show that the pooling aware method efficiently reduces the dictionary size by a factor of two for a given accuracy.

Details

NeurIPS Conference 2013 Conference Paper

Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

Yangqing Jia
Joshua Abbott
Joseph Austerweil
Tom Griffiths
Trevor Darrell

Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a significant advantage results from combining visual classifiers with the ability to identify an appropriate level of abstraction using Bayesian generalization.

PDF Details

UAI Conference 2012 Conference Paper

Factorized Multi-Modal Topic Model

Seppo Virtanen
Yangqing Jia
Arto Klami
Trevor Darrell

Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data, multiple variants of topic models attempting to tie the modalities together have been presented. All of these, however, lack the ability to learn components private to one modality, and consequently will try to force dependencies even between minimally correlating modalities. In this work we combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics. The model is shown to be especially useful for querying the contents of one domain given samples of the other.

Details

NeurIPS Conference 2012 Conference Paper

Learning with Recursive Perceptual Representations

Oriol Vinyals
Yangqing Jia
Li Deng
Trevor Darrell

Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous --often more complicated-- methods on several vision and speech benchmarks.

PDF Details

NeurIPS Conference 2011 Conference Paper

Heavy-tailed Distances for Gradient Based Image Descriptors

Yangqing Jia
Trevor Darrell

Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gamma-compound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost.

PDF Details

IROS Conference 2011 Conference Paper

Practical 3-D object detection using category and instance-level appearance models

Kate Saenko
Sergey Karayev
Yangqing Jia
Alex Shyr
Allison Janoch
Jonathan Long
Mario Fritz
Trevor Darrell

Bipedal walking in human environments is made difficult by the unevenness of the terrain and by external disturbances. Most approaches to bipedal walking in such environments either rely upon a precise model of the surface or special hardware designed for uneven terrain. In this paper, we present an alternative approach to stabilize the walking of an inexpensive, commercially-available, position-controlled humanoid robot in difficult environments. We use electrically compliant swing foot dynamics and onboard sensors to estimate the inclination of the local surface, and use a online learning algorithm to learn an adaptive surface model. Perturbations due to external disturbances or model errors are rejected by a hierarchical push recovery controller, which modulates three biomechanically motivated push recovery controllers according to the current estimated state. We use a physically realistic simulation with an articulated robot model and reinforcement learning algorithm to train the push recovery controller, and implement the learned controller on a commercial DARwIn-OP small humanoid robot. Experimental results show that this combined approach enables the robot to walk over unknown, uneven surfaces without falling down.

Details

NeurIPS Conference 2010 Conference Paper

Factorized Latent Spaces with Structured Sparsity

Yangqing Jia
Mathieu Salzmann
Trevor Darrell

Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing non-convex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multi-view learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow: having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms state-of-the-art methods on the task of human pose estimation.

PDF Details

IJCAI Conference 2009 Conference Paper

Yangqing Jia
Shuicheng Yan
Changshui Zhang

In this paper, we consider semi-supervised classi- ﬁcation on evolutionary data, where the distribution of the data and the underlying concept that we aim to learn change over time due to shortterm noises and long-term drifting, making a single aggregated classiﬁer inapplicable for long-term classiﬁcation. The drift is smooth if we take a localized view over the time dimension, which enables us to impose temporal smoothness assumption for the learning algorithm. We ﬁrst discuss how to carry out such assumption using temporal regularizers deﬁned in a structural way with respect to the Hilbert space, and then derive the online algorithm that efﬁciently ﬁnds the closed-form solution to the classiﬁcation functions. Experimental results on real-world evolutionary mailing list data demonstrate that our algorithm outperforms classical semi-supervised learning algorithms in both algorithmic stability and classiﬁcation accuracy.

PDF Details

AAAI Conference 2008 Conference Paper

Instance-level Semisupervised Multiple Instance Learning

Yangqing Jia

Multiple instance learning (MIL) is a branch of machine learning that attempts to learn information from bags of instances. Many real-world applications such as localized content-based image retrieval and text categorization can be viewed as MIL problems. In this paper, we propose a new graph-based semi-supervised learning approach for multiple instance learning. By defining an instance-level graph on the data, we first propose a new approach to construct an optimization framework for multiple instance semi-supervised learning, and derive an efficient way to overcome the non-convexity of MIL. We empirically show that our method outperforms state-of-the-art MIL algorithms on several real-world data sets.

PDF Details