Author name cluster

Fuxin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Convex Potential Mirror Langevin Algorithm for Efficient Sampling of Energy-Based Models

Zitao Yang
Amin Ullah
Shuai Li
Fuxin Li
Jun Li

This paper introduces the Convex Potential Mirror Langevin Algorithm (CPMLA), a novel method to improve sampling efficiency for Energy-Based Models (EBMs). CPMLA uses mirror Langevin dynamics with a convex potential flow as a dynamic mirror map for EBM sampling. This dynamic mirror map enables targeted geometric exploration on the data manifold, accelerating convergence to the target distribution. Theoretical analysis proves that CPMLA achieves exponential convergence with vanishing bias under relaxed log-concave conditions, supporting its efficiency in adapting to complex data distributions. Experiments on benchmarks like CIFAR-10, SVHN, and CelebA demonstrate CPMLA's improved sampling quality and inference efficiency over existing techniques.

PDF Details

AAAI Conference 2025 Conference Paper

Data Augmentation Approaches for Satellite Imagery

Laurel M. Hopkins
Weng-Keen Wong
Hannah Kerner
Fuxin Li
Rebecca A. Hutchinson

Deep learning models commonly benefit from data augmentation techniques to diversify the set of training images. When working with satellite imagery, it is common for practitioners to apply a limited set of transformations developed for natural images (e.g., flip and rotate) to expand the training set without overly modifying the satellite images. There are many techniques for natural image data augmentation, but given the differences between the two domains, it is not clear whether data augmentation methods developed for natural images are well suited for satellite imagery. This paper presents an extensive experimental study on three classification and three regression tasks over four satellite image datasets. We compare common computer vision data augmentation techniques and propose three novel satellite-specific data augmentation strategies. Across tasks and datasets, we find that geometric transformations are beneficial for satellite imagery while color transformations generally are not. Additionally, our novel Sat-SlideMix, Sat-CutMix, and Sat-Trivial methods all exhibit strong performance across all tasks and datasets.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Point-based Instance Completion with Scene Constraints

Wesley Khademi
Fuxin Li

Recent point-based object completion methods have demonstrated the ability to accurately recover the missing geometry of partially observed objects. However, these approaches are not well-suited for completing objects within a scene, as they do not consider known scene constraints (e.g., other observed surfaces) in their completions and further expect the partial input to be in a canonical coordinate system, which does not hold for objects within scenes. While instance scene completion methods have been proposed for completing objects within a scene, they lag behind point-based object completion methods in terms of object completion quality and still do not consider known scene constraints during completion. To overcome these limitations, we propose a point cloud-based instance completion model that can robustly complete objects at arbitrary scales and pose in the scene. To enable reasoning at the scene level, we introduce a sparse set of scene constraints represented as point clouds and integrate them into our completion model via a cross-attention mechanism. To evaluate the instance scene completion task on indoor scenes, we further build a new dataset called ScanWCF, which contains labeled partial scans as well as aligned ground truth scene completions that are watertight and collision-free. Through several experiments, we demonstrate that our method achieves improved fidelity to partial scans, higher completion quality, and greater plausibility over existing state-of-the-art methods.

Details

ICRA Conference 2024 Conference Paper

CVAE-SM: A Conditional Variational Autoencoder with Style Modulation for Efficient Uncertainty Quantification

Amin Ullah
Taiqing Yan
Fuxin Li

Deep learning has brought transformative advancements to object segmentation, especially in marine robotics contexts such as waste management and subaquatic infrastructure oversight. However, a central challenge persists: calibrating the prediction confidence of the model to ensure robust and reliable outcomes, especially within the demanding underwater environment. Existing solutions for estimating uncertainty are often computationally intensive and have largely centered around Bayesian neural networks or ensemble methods. In this paper, we present a Conditional Variational Autoencoder-based framework (CVAE-SM), which is capable of generating diverse latent codes for improved uncertainty quantification in image segmentation. Our method, enhanced by a style modulator, merges content features, and latent codes more effectively, leading to refined prediction of uncertainty levels. We further introduce a dataset of perturbed underwater images to benchmark uncertainty quantification in this domain. The proposed model not only surpasses peers in segmentation metrics but also matches ensemble models in uncertainty predictions, all while being 2. 5 times faster.

Details

ICRA Conference 2024 Conference Paper

Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models

Yixuan Huang
Jialin Yuan
Chanho Kim
Pupul Pradhan
Bryan Chen
Fuxin Li
Tucker Hermans

Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real-world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers of distractor actions. Furthermore, we show our approaches outperform an implicit memory baseline.

Details

ICRA Conference 2024 Conference Paper

Point Cloud Models Improve Visual Robustness in Robotic Learners

Skand Peri
Iain Lee
Chanho Kim
Fuxin Li
Tucker Hermans
Stefan Lee

Visual control policies can encounter significant performance degradation when visual conditions like lighting or camera position differ from those seen during training – often exhibiting sharp declines in capability even for minor differences. In this work, we examine robustness to a suite of these types of visual changes for RGB-D and point cloud based visual control policies. To perform these experiments on both model-free and model-based reinforcement learners, we introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies. Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts. Further, we find our proposed PCWM significantly outperforms prior works in terms of sample efficiency during training. Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners. Code: https://github.com/pvskand/pcwm

Details

NeurIPS Conference 2023 Conference Paper

Diverse Shape Completion via Style Modulated Generative Adversarial Networks

Wesley Khademi
Fuxin Li

Shape completion aims to recover the full 3D geometry of an object from a partial observation. This problem is inherently multi-modal since there can be many ways to plausibly complete the missing regions of a shape. Such diversity would be indicative of the underlying uncertainty of the shape and could be preferable for downstream tasks such as planning. In this paper, we propose a novel conditional generative adversarial network that can produce many diverse plausible completions of a partially observed point cloud. To enable our network to produce multiple completions for the same partial input, we introduce stochasticity into our network via style modulation. By extracting style codes from complete shapes during training, and learning a distribution over them, our style codes can explicitly carry shape category information leading to better completions. We further introduce diversity penalties and discriminators at multiple scales to prevent conditional mode collapse and to train without the need for multiple ground truth completions for each partial input. Evaluations across several synthetic and real datasets demonstrate that our method achieves significant improvements in respecting the partial observations while obtaining greater diversity in completions.

PDF Details

ICRA Conference 2023 Conference Paper

Real-Time Generative Grasping with Spatio-temporal Sparse Convolution

Timothy R. Player
Dongsik Chang
Fuxin Li
Geoffrey A. Hollinger

Robots performing mobile manipulation in unstructured environments must identify grasp affordances quickly and with robustness to perception noise. Yet in domains such as underwater manipulation, where perception noise is severe, computation is constrained, and the environment is dynamic, existing techniques fail. They are too computationally demanding, or too sensitive to noise to allow for closed loop grasping or dynamic replanning, or do not consider 6-DOF grasps. We present a novel grasp synthesis network, TSGrasp, that uses spatio-temporal sparse convolution to process a streaming point cloud in real time. The network generates 6-DOF grasps at greater speed and with less memory than Contact GraspNet, a state-of-the-art algorithm based on Point-Net++. By considering information from multiple successive frames of depth video, TSGrasp boosts robustness to noise or temporary self-occlusion and allows more grasps to be rapidly identified. Our grasp synthesis system was successfully demonstrated in an underwater environment with a Blueprint Labs Bravo robotic arm.

Details

AIJ Journal 2021 Journal Article

Counterfactual state explanations for reinforcement learning agents via generative deep learning

Matthew L. Olson
Roli Khanna
Lawrence Neal
Fuxin Li
Weng-Keen Wong

Details DOI

ICML Conference 2021 Conference Paper

Generative Particle Variational Inference via Estimation of Functional Gradients

Neale Ratzlaff
Qinxun Bai
Fuxin Li
Wei Xu 0017

Recently, particle-based variational inference (ParVI) methods have gained interest because they can avoid arbitrary parametric assumptions that are common in variational inference. However, many ParVI approaches do not allow arbitrary sampling from the posterior, and the few that do allow such sampling suffer from suboptimality. This work proposes a new method for learning to approximately sample from the posterior distribution. We construct a neural sampler that is trained with the functional gradient of the KL-divergence between the empirical sampling distribution and the target distribution, assuming the gradient resides within a reproducing kernel Hilbert space. Our generative ParVI (GPVI) approach maintains the asymptotic performance of ParVI methods while offering the flexibility of a generative sampler. Through carefully constructed experiments, we show that GPVI outperforms previous generative ParVI methods such as amortized SVGD, and is competitive with ParVI as well as gold-standard approaches like Hamiltonian Monte Carlo for fitting both exactly known and intractable target distributions.

Details

NeurIPS Conference 2021 Conference Paper

One Explanation is Not Enough: Structured Attention Graphs for Image Classification

Vivswan Shitole
Fuxin Li
Minsuk Kahng
Prasad Tadepalli
Alan Fern

Attention maps are popular tools for explaining the decisions of convolutional neural networks (CNNs) for image classification. Typically, for each image of interest, a single attention map is produced, which assigns weights to pixels based on their importance to the classification. We argue that a single attention map provides an incomplete understanding since there are often many other maps that explain a classification equally well. In this paper, we propose to utilize a beam search algorithm to systematically search for multiple explanations for each image. Results show that there are indeed multiple relatively localized explanations for many images. However, naively showing multiple explanations to users can be overwhelming and does not reveal their common and distinct structures. We introduce structured attention graphs (SAGs), which compactly represent sets of attention maps for an image by visualizing how different combinations of image regions impact the confidence of a classifier. An approach to computing a compact and representative SAG for visualization is proposed via diverse sampling. We conduct a user study comparing the use of SAGs to traditional attention maps for answering comparative counterfactual questions about image classifications. Our results show that the users are significantly more accurate when presented with SAGs compared to standard attention map baselines.

PDF Details

ICLR Conference 2021 Conference Paper

Topology-Aware Segmentation Using Discrete Morse Theory

Xiaoling Hu 0002
Yusu Wang 0001
Fuxin Li
Dimitris Samaras
Chao Chen 0012

In the segmentation of fine-scale structures from natural and biomedical images, per-pixel accuracy is not the only metric of concern. Topological correctness, such as vessel connectivity and membrane closure, is crucial for downstream analysis tasks. In this paper, we propose a new approach to train deep image segmentation networks for better topological accuracy. In particular, leveraging the power of discrete Morse theory (DMT), we identify global structures, including 1D skeletons and 2D patches, which are important for topological accuracy. Trained with a novel loss based on these global structures, the network performance is significantly improved especially near topologically challenging locations (such as weak spots of connections and membranes). On diverse datasets, our method achieves superior performance on both the DICE score and topological metrics.

Details

NeurIPS Conference 2020 Conference Paper

Deep Variational Instance Segmentation

Jialin Yuan
Chao Chen
Fuxin Li

Instance segmentation, which seeks to obtain both class and instance labels for each pixel in the input image, is a challenging task in computer vision. State-of- the-art algorithms often employ a search-based strategy, which first divides the output image with a regular grid and generate proposals at each grid cell, then the proposals are classified and boundaries refined. In this paper, we propose a novel algorithm that directly utilizes a fully convolutional network (FCN) to predict instance labels. Specifically, we propose a variational relaxation of instance segmentation as minimizing an optimization functional for a piecewise-constant segmentation problem, which can be used to train an FCN end-to-end. It extends the classical Mumford-Shah variational segmentation algorithm to be able to handle the permutation-invariant ground truth in instance segmentation. Experiments on PASCAL VOC 2012 and the MSCOCO 2017 dataset show that the proposed approach efficiently tackles the instance segmentation task.

PDF Details

ICLR Conference 2020 Conference Paper

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

Jun Li 0098
Fuxin Li
Sinisa Todorovic

Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.

Details

ICML Conference 2020 Conference Paper

Implicit Generative Modeling for Efficient Exploration

Neale Ratzlaff
Qinxun Bai
Fuxin Li
Wei Xu 0017

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. In this work, we introduce an exploration approach based on a novel implicit generative modeling algorithm to estimate a Bayesian uncertainty of the agent’s belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the predictions based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we demonstrate the effectiveness of this exploration algorithm in both pure exploration tasks and a downstream task, comparing with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.

Details

AAAI Conference 2020 Conference Paper

ScaleNet – Improve CNNs through Recursively Rescaling Objects

Xingyi Li
Zhongang Qi
Xiaoli Fern
Fuxin Li

Deep networks are often not scale-invariant hence their performance can vary wildly if recognizable objects are at an unseen scale occurring only at testing time. In this paper, we propose ScaleNet, which recursively predicts object scale in a deep learning framework. With an explicit objective to predict the scale of objects in images, ScaleNet enables pretrained deep learning models to identify objects in the scales that are not present in their training sets. By recursively calling ScaleNet, one can generalize to very large scale changes unseen in the training set. To demonstrate the robustness of our proposed framework, we conduct experiments with pretrained as well as ﬁne-tuned classiﬁcation and detection frameworks on MNIST, CIFAR-10, and MS COCO datasets and results reveal that our proposed framework signiﬁcantly boosts the performances of deep networks.

PDF Details

IROS Conference 2019 Conference Paper

ElevateNet: A Convolutional Neural Network for Estimating the Missing Dimension in 2D Underwater Sonar Images

Robert DeBortoli
Fuxin Li
Geoffrey A. Hollinger

In this work we address the challenge of predicting the missing dimension (elevation angle) from 2D underwater sonar images. The high noise levels in these images, from phenomena such as non-diffuse reflections, frequently limits the usefulness of physical models. We thus propose the utilization of Convolutional Neural Networks (CNNs) as a powerful method to extract meaningful information without being misled by noisy data. We also introduce a self-supervised method that uses the physics of the sonar sensor to train the network on real data without ground-truth elevation maps. Our method can produce accurate elevation angle estimates given only a single image. Finally, we demonstrate that our method produces more accurate 3D reconstructions than competing methods, both in simulation and on real data.

Details

ICML Conference 2019 Conference Paper

HyperGAN: A Generative Model for Diverse, Performant Neural Networks

Neale Ratzlaff
Fuxin Li

We introduce HyperGAN, a generative model that learns to generate all the parameters of a deep neural network. HyperGAN first transforms low dimensional noise into a latent space, which can be sampled from to obtain diverse, performant sets of parameters for a target architecture. We utilize an architecture that bears resemblance to generative adversarial networks, but we evaluate the likelihood of generated samples with a classification loss. This is equivalent to minimizing the KL-divergence between the distribution of generated parameters, and the unknown true parameter distribution. We apply HyperGAN to classification, showing that HyperGAN can learn to generate parameters which solve the MNIST and CIFAR-10 datasets with competitive performance to fully supervised learning, while also generating a rich distribution of effective parameters. We also show that HyperGAN can also provide better uncertainty estimates than standard ensembles. This is evidenced by the ability of HyperGAN-generated ensembles to detect out of distribution data as well as adversarial examples.

Details

NeurIPS Conference 2019 Conference Paper

Topology-Preserving Deep Image Segmentation

Xiaoling Hu
Fuxin Li
Dimitris Samaras
Chao Chen

Segmentation algorithms are prone to make topological errors on fine-scale struc- tures, e. g. , broken connections. We propose a novel method that learns to segment with correct topology. In particular, we design a continuous-valued loss function that enforces a segmentation to have the same topology as the ground truth, i. e. ,having the same Betti number. The proposed topology-preserving loss function is differentiable and can be incorporated into end-to-end training of a deep neural network. Our method achieves much better performance on the Betti number error, which directly accounts for the topological correctness. It also performs superior on other topology-relevant metrics, e. g. , the Adjusted Rand Index and the Variation of Information, without sacrificing per-pixel accuracy. We illustrate the effectiveness of the proposed method on a broad spectrum of natural and biomedical datasets.

PDF Details

ICRA Conference 2018 Conference Paper

Real-Time Underwater 3D Reconstruction Using Global Context and Active Labeling

Robert DeBortoli
Austin Nicolai
Fuxin Li
Geoffrey A. Hollinger

In this work we develop a novel framework that enables the real-time 3D reconstruction of underwater environments using features from 2D sonar images. Due to noisy and low-resolution imagery as compared with standard cameras, automatic feature extractors for sonar images are not reliable in many scenarios. Thus, a human often needs to hand-select features in sonar imagery for environment reconstructions. Given the high data capture rates of standard imaging sonars (on the order of 20Hz), hand-annotating the features in every frame cannot be done in real-time. To address this we use a Convolutional Neural Network (CNN) that analyzes incoming imagery in real-time and proposes only a small subset of high-quality frames to the user for feature annotation. We demonstrate that our approach provides real-time reconstruction capability without loss in classification performance on datasets captured onboard our underwater vehicle while operating in a variety of environments.

Details

NeurIPS Conference 2015 Conference Paper

Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression

Yu-Ying Liu
Shuang Li
Fuxin Li
Le Song
James Rehg

The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter learning algorithm for CT-HMM restricts its use to very small models or requires unrealistic constraints on the state transitions. In this paper, we present the first complete characterization of efficient EM-based learning methods for CT-HMM models. We demonstrate that the learning problem consists of two challenges: the estimation of posterior state probabilities and the computation of end-state conditioned statistics. We solve the first challenge by reformulating the estimation problem in terms of an equivalent discrete time-inhomogeneous hidden Markov model. The second challenge is addressed by adapting three approaches from the continuous time Markov chain literature to the CT-HMM domain. We demonstrate the use of CT-HMMs with more than 100 states to visualize and predict disease progression using a glaucoma dataset and an Alzheimer's disease dataset.

PDF Details

NeurIPS Conference 2010 Conference Paper

Convex Multiple-Instance Learning by Estimating Likelihood Ratio

Fuxin Li
Cristian Sminchisescu

Multiple-Instance learning has been long known as a hard non-convex problem. In this work, we propose an approach that recasts it as a convex likelihood ratio estimation problem. Firstly, the constraint in multiple-instance learning is reformulated into a convex constraint on the likelihood ratio. Then we show that a joint estimation of a likelihood ratio function and the likelihood on training instances can be learned convexly. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio estimation. It is shown that our likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. However with the joint estimation it tends to underestimate the likelihood of an example to be positive. We propose to use these likelihood ratio estimates as features, and learn a linear combination on them to classify the bags. Experiments on synthetic and real datasets show the superiority of the approach.

PDF Details

ICML Conference 2007 Conference Paper

A transductive framework of distance metric learning by spectral dimensionality reduction

Fuxin Li
Jian Yang 0016
Jue Wang

Details