Arrow Research search

Author name cluster

Dilin Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
2 author rows

Possible papers

15

NeurIPS Conference 2025 Conference Paper

AutoPartGen: Autoregressive 3D Part Generation and Discovery

  • Minghao Chen
  • Jianyuan Wang
  • Roman Shapovalov
  • Tom Monnier
  • Hyunyoung Jung
  • Dilin Wang
  • Rakesh Ranjan
  • Iro Laina

We introduce AutoPartGen, a model that generates objects composed of 3D parts in an autoregressive manner. This model can take as input an image of an object, 2D masks of the object's parts, or an existing 3D object, and generate a corresponding compositional 3D reconstruction. Our approach builds upon 3DShape2VecSet, a recent latent 3D representation with powerful geometric expressiveness. We observe that this latent space exhibits strong compositional properties, making it particularly well-suited for part-based generation tasks. Specifically, AutoPartGen generates object parts autoregressively, predicting one part at a time while conditioning on previously generated parts and additional inputs, such as 2D images, masks, or 3D objects. This process continues until the model decides that all parts have been generated, thus determining automatically the type and number of parts. The resulting parts can be seamlessly assembled into coherent objects or scenes without requiring additional optimization. We evaluate both the overall 3D generation capabilities and the part-level generation quality of AutoPartGen, demonstrating that it achieves state-of-the-art performance in 3D part generation.

NeurIPS Conference 2025 Conference Paper

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

  • Kairun Wen
  • Runyu Chen
  • Hui Zheng
  • Yunlong Lin
  • Panwang Pan
  • Chenxin Li
  • Wenyan Cong
  • Jian Zhang

Understanding the dynamic physical world, characterized by its evolving 3D structure, real-world motion, and semantic content with textual descriptions, is crucial for human-agent interaction and enables embodied agents to perceive and act within real environments with human‑like capabilities. However, existing datasets are often derived from limited simulators or utilize traditional Structure-from-Motion for up-to-scale annotation and offer limited descriptive captioning, which restricts the capacity of foundation models to accurately interpret real-world dynamics from monocular videos, commonly sourced from the internet. To bridge these gaps, we introduce DynamicVerse, a physical‑scale, multimodal 4D world modeling framework for dynamic real-world video. We employ large vision, geometric, and multimodal models to interpret metric-scale static geometry, real-world dynamic motion, instance-level masks, and holistic descriptive captions. By integrating window-based Bundle Adjustment with global optimization, our method converts long real-world video sequences into a comprehensive 4D multimodal format. DynamicVerse delivers a large-scale dataset consists of 100K+ videos with 800K+ annotated masks and 10M+ frames from internet videos. Experimental evaluations on three benchmark tasks, namely video depth estimation, camera pose estimation, and camera intrinsics estimation, demonstrate that our 4D modeling achieves superior performance in capturing physical-scale measurements with greater global accuracy than existing methods.

ICML Conference 2024 Conference Paper

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

  • Zhangheng Li
  • Shiwei Liu 0003
  • Tianlong Chen 0001
  • Ajay Kumar Jaiswal
  • Zhenyu Zhang 0015
  • Dilin Wang
  • Raghuraman Krishnamoorthi
  • Shiyu Chang

Sparse Neural Networks (SNNs) have received voluminous attention for mitigating the explosion in computational costs and memory footprints of modern deep neural networks. Despite their popularity, most state-of-the-art training approaches seek to find a single high-quality sparse subnetwork with a preset sparsity pattern and ratio, making them inadequate to satiate platform and resource variability. Recently proposed approaches attempt to jointly train multiple subnetworks (we term as “sparse co-training") with a fixed sparsity pattern, to allow switching sparsity ratios subject to resource requirements. In this work, we take one more step forward and expand the scope of sparse co-training to cover diverse sparsity patterns and multiple sparsity ratios at once. We introduce Sparse Cocktail, the first sparse co-training framework that co-trains a suite of sparsity patterns simultaneously, loaded with multiple sparsity ratios which facilitate harmonious switch across various sparsity patterns and ratios at inference depending on the hardware availability. More specifically, Sparse Cocktail alternatively trains subnetworks generated from different sparsity patterns with a gradual increase in sparsity ratios across patterns and relies on an unified mask generation process and the Dense Pivot Co-training to ensure the subnetworks of different patterns orchestrate their shared parameters without canceling each other’s performance. Experiment results on image classification, object detection, and instance segmentation illustrate the favorable effectiveness and flexibility of Sparse Cocktail, pointing to a promising direction for sparse co-training. Codes will be released.

ICLR Conference 2022 Conference Paper

NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training

  • Chengyue Gong
  • Dilin Wang
  • Meng Li 0004
  • Xinlei Chen
  • Zhicheng Yan 0001
  • Yuandong Tian
  • Qiang Liu 0001
  • Vikas Chandra

Designing accurate and efficient vision transformers (ViTs) is a highly important but challenging task. Supernet-based one-shot neural architecture search (NAS) enables fast architecture optimization and has achieved state-of-the-art (SOTA) results on convolutional neural networks (CNNs). However, directly applying the supernet-based NAS to optimize ViTs leads to poor performance - even worse compared to training single ViTs. In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence. To alleviate this issue, we propose a series of techniques, including a gradient projection algorithm, a switchable layer scaling design, and a simplified data augmentation and regularization training recipe. The proposed techniques significantly improve the convergence and the performance of all sub-networks. Our discovered hybrid ViT model family, dubbed NASViT, achieves top-1 accuracy from 78.2% to 81.8% on ImageNet from 200M to 800M FLOPs, and outperforms all the prior art CNNs and ViTs, including AlphaNet and LeViT, etc. When transferred to semantic segmentation tasks, NASViTs also outperform previous backbones on both Cityscape and ADE20K datasets, achieving 73.2% and 37.9% mIoU with only 5G FLOPs, respectively. Code is available at https://github.com/facebookresearch/NASViT.

ICML Conference 2021 Conference Paper

AlphaNet: Improved Training of Supernets with Alpha-Divergence

  • Dilin Wang
  • Chengyue Gong
  • Meng Li 0004
  • Qiang Liu 0001
  • Vikas Chandra

Weight-sharing neural architecture search (NAS) is an effective technique for automating efficient neural architecture design. Weight-sharing NAS builds a supernet that assembles all the architectures as its sub-networks and jointly trains the supernet with the sub-networks. The success of weight-sharing NAS heavily relies on distilling the knowledge of the supernet to the sub-networks. However, we find that the widely used distillation divergence, i. e. , KL divergence, may lead to student sub-networks that over-estimate or under-estimate the uncertainty of the teacher supernet, leading to inferior performance of the sub-networks. In this work, we propose to improve the supernet training with a more generalized alpha-divergence. By adaptively selecting the alpha-divergence, we simultaneously prevent the over-estimation or under-estimation of the uncertainty of the teacher model. We apply the proposed alpha-divergence based supernets training to both slimmable neural networks and weight-sharing NAS, and demonstrate significant improvements. Specifically, our discovered model family, AlphaNet, outperforms prior-art models on a wide range of FLOPs regimes, including BigNAS, Once-for-All networks, and AttentiveNAS. We achieve ImageNet top-1 accuracy of 80. 0% with only 444M FLOPs. Our code and pretrained models are available at https: //github. com/facebookresearch/AlphaNet.

ICML Conference 2019 Conference Paper

Improving Neural Language Modeling via Adversarial Training

  • Dilin Wang
  • Chengyue Gong
  • Qiang Liu 0001

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46. 01 and 38. 65, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.

ICML Conference 2019 Conference Paper

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models

  • Dilin Wang
  • Qiang Liu 0001

Diversification has been shown to be a powerful mechanism for learning robust models in non-convex settings. A notable example is learning mixture models, in which enforcing diversity between the different mixture components allows us to prevent the model collapsing phenomenon and capture more patterns from the observed data. In this work, we present a variational approach for diversity-promoting learning, which leverages the entropy functional as a natural mechanism for enforcing diversity. We develop a simple and efficient functional gradient-based algorithm for optimizing the variational objective function, which provides a significant generalization of Stein variational gradient descent (SVGD). We test our method on various challenging real world problems, including deep embedded clustering and deep anomaly detection. Empirical results show that our method provides an effective mechanism for diversity-promoting learning, achieving substantial improvement over existing methods.

NeurIPS Conference 2019 Conference Paper

Splitting Steepest Descent for Growing Neural Architectures

  • Lemeng Wu
  • Dilin Wang
  • Qiang Liu

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a \emph{splitting gradient} for optimally updating the off-springs. Theoretically, our splitting strategy is a second order functional steepest descent for escaping saddle points in an $\Linfty$-Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings.

NeurIPS Conference 2019 Conference Paper

Stein Variational Gradient Descent With Matrix-Valued Kernels

  • Dilin Wang
  • Ziyang Tang
  • Chandrajit Bajaj
  • Qiang Liu

Stein variational gradient descent (SVGD) is a particle-based inference algorithm that leverages gradient information for efficient approximate inference. In this work, we enhance SVGD by leveraging preconditioning matrices, such as the Hessian and Fisher information matrix, to incorporate geometric information into SVGD updates. We achieve this by presenting a generalization of SVGD that replaces the scalar-valued kernels in vanilla SVGD with more general matrix-valued kernels. This yields a significant extension of SVGD, and more importantly, allows us to flexibly incorporate various preconditioning matricesto accelerate the exploration in the probability landscape. Empirical results show that our method outperforms vanilla SVGD and a variety of baseline approaches over a range of real-world Bayesian inference tasks.

NeurIPS Conference 2018 Conference Paper

Stein Variational Gradient Descent as Moment Matching

  • Qiang Liu
  • Dilin Wang

Stein variational gradient descent (SVGD) is a non-parametric inference algorithm that evolves a set of particles to fit a given distribution of interest. We analyze the non-asymptotic properties of SVGD, showing that there exists a set of functions, which we call the Stein matching set, whose expectations are exactly estimated by any set of particles that satisfies the fixed point equation of SVGD. This set is the image of Stein operator applied on the feature maps of the positive definite kernel used in SVGD. Our results provide a theoretical framework for analyzing the properties of SVGD with different kernels, shedding insight into optimal kernel choice. In particular, we show that SVGD with linear kernels yields exact estimation of means and variances on Gaussian distributions, while random Fourier features enable probabilistic bounds for distributional approximation. Our results offer a refreshing view of the classical inference problem as fitting Stein’s identity or solving the Stein equation, which may motivate more efficient algorithms.

ICML Conference 2018 Conference Paper

Stein Variational Message Passing for Continuous Graphical Models

  • Dilin Wang
  • Zhe Zeng 0001
  • Qiang Liu 0001

We propose a novel distributed inference algorithm for continuous graphical models, by extending Stein variational gradient descent (SVGD) to leverage the Markov dependency structure of the distribution of interest. Our approach combines SVGD with a set of structured local kernel functions defined on the Markov blanket of each node, which alleviates the curse of high dimensionality and simultaneously yields a distributed algorithm for decentralized inference tasks. We justify our method with theoretical analysis and show that the use of local kernels can be viewed as a new type of localized approximation that matches the target distribution on the conditional distributions of each node over its Markov blanket. Our empirical results show that our method outperforms a variety of baselines including standard MCMC and particle message passing methods.

NeurIPS Conference 2018 Conference Paper

Variational Inference with Tail-adaptive f-Divergence

  • Dilin Wang
  • Hao Liu
  • Qiang Liu

Variational inference with α-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using α-divergences (with positive α values) is their mass-covering property. However, estimating and optimizing α-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantee finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm (Haarnoja et al. , 2018). Our results show that our approach yields significant advantages compared with existing methods based on classical KL and α-divergences.

UAI Conference 2017 Conference Paper

Learning to Draw Samples with Amortized Stein Variational Gradient Descent

  • Yihao Feng
  • Dilin Wang
  • Qiang Liu 0001

We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient direction (Liu & Wang, 2016) that maximally decreases the KL divergence with the target distribution. Our method works for any target distribution specified by their unnormalized density function, and can train any black-box architectures that are differentiable in terms of the parameters we want to adapt. We demonstrate our method with a number of applications, including variational autoencoder (VAE) with expressive encoders to model complex latent space structures, and hyper-parameter learning of MCMC samplers that allows Bayesian inference to adaptively improve itself when seeing more data.

UAI Conference 2016 Conference Paper

Efficient Observation Selection in Probabilistic Graphical Models Using Bayesian Lower Bounds

  • Dilin Wang
  • John W. Fisher III
  • Qiang Liu 0001

Real-world data often includes rich relational information, which can be leveraged to help predict unknown variables using a small amount of observed variables via a propagation effect. We consider the problem of selecting the best subset of variables to observe to maximize the overall prediction accuracy. Under the Bayesian framework, the optimal subset should be chosen to minimize the Bayesian optimal error rate, which, unfortunately, is critically challenging to calculate when the variables follow complex and high dimensional probabilistic distributions such as graphical models. In this paper, we propose to use a class of Bayesian lower bounds, including Bayesian Cramér Rao bounds as well as a novel extension of it to discrete graphical models, as surrogate criteria for optimal subset selection, providing a set of computationally efficient algorithms. Extensive experiments are presented to demonstrate our algorithm on both simulated and real-world datasets.

NeurIPS Conference 2016 Conference Paper

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

  • Qiang Liu
  • Dilin Wang

We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the target distribution, by applying a form of functional gradient descent that minimizes the KL divergence. Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. The derivation of our method is based on a new theoretical result that connects the derivative of KL divergence under smooth transforms with Stein’s identity and a recently proposed kernelized Stein discrepancy, which is of independent interest.