Author name cluster

Baoxin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

AAAI Conference 2024 Conference Paper

Transformer-Based Selective Super-resolution for Efficient Image Refinement

Tianyi Zhang
Kishore Kasichainula
Yaoxin Zhuo
Baoxin Li
Jae-Sun Seo
Yu Cao

Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which partitions images into non-overlapping tiles, selects tiles of interest at various scales with a pyramid architecture, and exclusively reconstructs these selected tiles with deep features. Experimental results on three datasets demonstrate the efficiency and robust performance of our approach for super-resolution. Compared to the state-of-the-art methods, the FID score is reduced from 26.78 to 10.41 with 40% reduction in computation cost for the BDD100K dataset.

PDF Details DOI

ICML Conference 2021 Conference Paper

Deep Latent Graph Matching

Tianshu Yu 0001
Runzhong Wang
Junchi Yan
Baoxin Li

Deep learning for graph matching (GM) has emerged as an important research topic due to its superior performance over traditional methods and insights it provides for solving other combinatorial problems on graph. While recent deep methods for GM extensively investigated effective node/edge feature learning or downstream GM solvers given such learned features, there is little existing work questioning if the fixed connectivity/topology typically constructed using heuristics (e. g. , Delaunay or k-nearest) is indeed suitable for GM. From a learning perspective, we argue that the fixed topology may restrict the model capacity and thus potentially hinder the performance. To address this, we propose to learn the (distribution of) latent topology, which can better support the downstream GM task. We devise two latent graph generation procedures, one deterministic and one generative. Particularly, the generative procedure emphasizes the across-graph consistency and thus can be viewed as a matching-guided co-generative model. Our methods deliver superior performance over previous state-of-the-arts on public benchmarks, hence supporting our hypothesis.

Details

ICLR Conference 2020 Conference Paper

Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient

Tianshu Yu 0001
Yikang Li 0001
Baoxin Li

Determinantal point processes (DPPs) is an effective tool to deliver diversity on multiple machine learning and computer vision tasks. Under deep learning framework, DPP is typically optimized via approximation, which is not straightforward and has some conflict with diversity requirement. We note, however, there has been no deep learning paradigms to optimize DPP directly since it involves matrix inversion which may result in highly computational instability. This fact greatly hinders the wide use of DPP on some specific objectives where DPP serves as a term to measure the feature diversity. In this paper, we devise a simple but effective algorithm to address this issue to optimize DPP term directly expressed with L-ensemble in spectral domain over gram matrix, which is more flexible than learning on parametric kernels. By further taking into account some geometric constraints, our algorithm seeks to generate valid sub-gradients of DPP term in case when the DPP gram matrix is not invertible (no gradients exist in this case). In this sense, our algorithm can be easily incorporated with multiple deep learning tasks. Experiments show the effectiveness of our algorithm, indicating promising performance for practical learning problems.

Details

ICLR Conference 2020 Conference Paper

Learning deep graph matching with channel-independent embedding and Hungarian attention

Tianshu Yu 0001
Runzhong Wang
Junchi Yan
Baoxin Li

Graph matching aims to establishing node-wise correspondence between two graphs, which is a classic combinatorial problem and in general NP-complete. Until very recently, deep graph matching methods start to resort to deep networks to achieve unprecedented matching accuracy. Along this direction, this paper makes two complementary contributions which can also be reused as plugin in existing works: i) a novel node and edge embedding strategy which stimulates the multi-head strategy in attention models and allows the information in each channel to be merged independently. In contrast, only node embedding is accounted in previous works; ii) a general masking mechanism over the loss function is devised to improve the smoothness of objective learning for graph matching. Using Hungarian algorithm, it dynamically constructs a structured and sparsely connected layer, taking into account the most contributing matching pairs as hard attention. Our approach performs competitively, and can also improve state-of-the-art methods as plugin, regarding with matching accuracy on three public benchmarks.

Details

NeurIPS Conference 2018 Conference Paper

Generalizing Graph Matching beyond Quadratic Assignment Model

Tianshu Yu
Junchi Yan
Yilin Wang
Wei Liu
Baoxin Li

Graph matching has received persistent attention over decades, which can be formulated as a quadratic assignment problem (QAP). We show that a large family of functions, which we define as Separable Functions, can approximate discrete graph matching in the continuous domain asymptotically by varying the approximation controlling parameters. We also study the properties of global optimality and devise convex/concave-preserving extensions to the widely used Lawler's QAP form. Our theoretical findings show the potential for deriving new algorithms and techniques for graph matching. We deliver solvers based on two specific instances of Separable Functions, and the state-of-the-art performance of our method is verified on popular benchmarks.

PDF Details

AAMAS Conference 2018 Conference Paper

Recognizing Plans by Learning Embeddings from Observed Action Distributions

Yantian Zha
Yikang Li
Sriram Gopalakrishnan
Baoxin Li
Subbarao Kambhampati

Automated video surveillance requires the recognition of agent plans from videos. One promising direction for plan recognition involves learning shallow action affinity models from plan traces. Extracting such traces from raw video involves uncertainty about the actions. One solution is to represent traces as sequences of action distributions. To use such a representation in approximate plan recognition, we need embeddings of these action distributions. To address this problem, we propose a distribution to vector (Distr2Vec) model, which learns embeddings of action distributions using KL-divergence as the loss function.

PDF

AAAI Conference 2017 Conference Paper

CLARE: A Joint Approach to Label Classification and Tag Recommendation

Yilin Wang
Suhang Wang
Jiliang Tang
Guojun Qi
Huan Liu
Baoxin Li

Data classiﬁcation and tag recommendation are both important and challenging tasks in social media. These two tasks are often considered independently and most efforts have been made to tackle them separately. However, labels in data classiﬁcation and tags in tag recommendation are inherently related. For example, a Youtube video annotated with NCAA, stadium, pac12 is likely to be labeled as football, while a video/image with the class label of coast is likely to be tagged with beach, sea, water and sand. The existence of relations between labels and tags motivates us to jointly perform classiﬁcation and tag recommendation for social media data in this paper. In particular, we provide a principled way to capture the relations between labels and tags, and propose a novel framework CLARE, which fuses data CLAssiﬁcation and tag REcommendation into a coherent model. With experiments on three social media datasets, we demonstrate that the proposed framework CLARE achieves superior performance on both tasks compared to the state-of-the-art methods.

PDF Details

IJCAI Conference 2016 Conference Paper

Clustering-Based Joint Feature Selection for Semantic Attribute Prediction

Lin Chen
Baoxin Li

Semantic attributes have been proposed to bridge the semantic gap between low-level feature representation and high-level semantic understanding of visual objects. Obtaining a good representation of semantic attributes usually requires learning from high-dimensional low-level features, which not only significantly increases the time and space requirement but also degrades the performance due to numerous irrelevant features. Since multi-attribute prediction can be generalized as a multi-task learning problem, sparse-based multi-task feature selection approaches have been introduced, utilizing the relatedness among multiple attributes. However, such approaches either do not investigate the pattern of the relatedness among attributes, or require prior knowledge about the pattern. In this paper, we propose a novel feature selection approach which embeds attribute correlation modeling in multi-attribute joint feature selection. Experiments on both synthetic dataset and multiple public benchmark datasets demonstrate that the proposed approach effectively captures the correlation among multiple attributes and significantly outperforms the state-of-the-art approaches.

PDF Details

IJCAI Conference 2015 Conference Paper

Unsupervised Sentiment Analysis for Social Media Images

Yilin Wang
Suhang Wang
Jiliang Tang
Huan Liu
Baoxin Li

Recently text-based sentiment prediction has been extensively studied, while image-centric sentiment analysis receives much less attention. In this paper, we study the problem of understanding human sentiments from large-scale social media images, considering both visual content and contextual information, such as comments on the images, captions, etc. The challenge of this problem lies in the “semantic gap” between low-level visual features and higher-level image sentiments. Moreover, the lack of proper annotations/labels in the majority of social media images presents another challenge. To address these two challenges, we propose a novel Unsupervised SEntiment Analysis (USEA) framework for social media images. Our approach exploits relations among visual content and relevant contextual information to bridge the “semantic gap” in prediction of image sentiments. With experiments on two large-scale datasets, we show that the proposed method is effective in addressing the two challenges.

PDF Details

IJCAI Conference 2009 Conference Paper

Zheshen Wang
Baoxin Li

Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.

PDF Details

ICRA Conference 2006 Conference Paper

Homography-based Ground Detection for a Mobile Robot Platform using a Single Camera

Jin Zhou 0005
Baoxin Li

This paper presents a practical approach to ground detection in mobile robot applications based on a monocular sequence captured by an on-board camera. We formulate the problem of ground plane detection as one of estimating the dominant homography between two frames taken from the sequence, and then design an efficient algorithm for the estimation. In particular, we analyze a problem inherent to any homography-based approach to the given task, and show how the proposed approach can address this problem to a large degree. Although not explicitly discussed, the proposed method can be used to guide the maneuver of the robot, as the detected ground plane can in turn be used in obstacle avoidance

Details