Author name cluster

Wenlin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

ICML Conference 2025 Conference Paper

Progressive Tempering Sampler with Diffusion

Severi Rissanen
Ruikang Ouyang
Jiajun He 0003
Wenlin Chen
Markus Heinonen
Arno Solin
José Miguel Hernández-Lobato

Recent research has focused on designing neural samplers that amortize the process of sampling from unnormalized densities. However, despite significant advancements, they still fall short of the state-of-the-art MCMC approach, Parallel Tempering (PT), when it comes to the efficiency of target evaluations. On the other hand, unlike a well-trained neural sampler, PT yields only dependent samples and needs to be rerun—at considerable computational cost—whenever new samples are required. To address these weaknesses, we propose the Progressive Tempering Sampler with Diffusion (PTSD), which trains diffusion models sequentially across temperatures, leveraging the advantages of PT to improve the training of neural samplers. We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples, which are minimally refined using MCMC and used to train the next diffusion model. PTSD enables efficient reuse of sample information across temperature levels while generating well-mixed, uncorrelated samples. Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.

Details

ICML Conference 2024 Conference Paper

Diffusive Gibbs Sampling

Wenlin Chen
Mingtian Zhang
Brooks Paige
José Miguel Hernández-Lobato
David Barber

The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.

Details

TMLR Journal 2024 Journal Article

Leveraging Task Structures for Improved Identifiability in Neural Network Representations

Wenlin Chen
Julien Horwood
Juyeon Heo
José Miguel Hernández-Lobato

This work extends the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks. In such cases, we show that linear identifiability is achievable in the general multi-task regression setting. Furthermore, we show that the existence of a task distribution which defines a conditional prior over latent factors reduces the equivalence class for identifiability to permutations and scaling of the true latent factors, a stronger and more useful result than linear identifiability. Crucially, when we further assume a causal structure over these tasks, our approach enables simple maximum marginal likelihood optimization, and suggests potential downstream applications to causal representation learning. Empirically, we find that this straightforward optimization procedure enables our model to outperform more general unsupervised models in recovering canonical representations for both synthetic data and real-world molecular data.

PDF Details

NeurIPS Conference 2024 Conference Paper

Neural Characteristic Activation Analysis and Geometric Parameterization for ReLU Networks

Wenlin Chen
Hong Ge

We introduce a novel approach for analyzing the training dynamics of ReLU networks by examining the characteristic activation boundaries of individual ReLU neurons. Our proposed analysis reveals a critical instability in common neural network parameterizations and normalizations during stochastic optimization, which impedes fast convergence and hurts generalization performance. Addressing this, we propose Geometric Parameterization (GmP), a novel neural network parameterization technique that effectively separates the radial and angular components of weights in the hyperspherical coordinate system. We show theoretically that GmP resolves the aforementioned instability issue. We report empirical results on various models and benchmarks to verify GmP's advantages of optimization stability, convergence speed and generalization performance.

PDF Details DOI

ICML Conference 2024 Conference Paper

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Buyun Zhang
Liang Luo
Yuxin Chen 0001
Jade Nie
Xi Liu
Shen Li
Yanli Zhao
Yuchen Hao

Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets. In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong’s unique design makes it possible to capture diverse, any-order of interactions simply through taller and wider layers. We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong’s scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over state-of-the-art models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short.

Details

ICLR Conference 2023 Conference Paper

Meta-learning Adaptive Deep Kernel Gaussian Processes for Molecular Property Prediction

Wenlin Chen
Austin Tripp
José Miguel Hernández-Lobato

We propose Adaptive Deep Kernel Fitting with Implicit Function Theorem (ADKF-IFT), a novel framework for learning deep kernel Gaussian processes (GPs) by interpolating between meta-learning and conventional deep kernel learning. Our approach employs a bilevel optimization objective where we meta-learn generally useful feature representations across tasks, in the sense that task-specific GP models estimated on top of such features achieve the lowest possible predictive loss on average. We solve the resulting nested optimization problem using the implicit function theorem (IFT). We show that our ADKF-IFT framework contains previously proposed Deep Kernel Learning (DKL) and Deep Kernel Transfer (DKT) as special cases. Although ADKF-IFT is a completely general method, we argue that it is especially well-suited for drug discovery problems and demonstrate that it significantly outperforms previous state-of-the-art methods on a variety of real-world few-shot molecular property prediction tasks and out-of-domain molecular property prediction and optimization tasks.

Details

TMLR Journal 2022 Journal Article

Optimal Client Sampling for Federated Learning

Wenlin Chen
Samuel Horváth
Peter Richtárik

It is well understood that client-master communication can be a primary bottleneck in federated learning (FL). In this work, we address this issue with a novel client subsampling scheme, where we restrict the number of clients allowed to communicate their updates back to the master node. In each communication round, all participating clients compute their updates, but only the ones with "important" updates communicate back to the master. We show that importance can be measured using only the norm of the update and give a formula for optimal client participation. This formula minimizes the distance between the full update, where all clients participate, and our limited update, where the number of participating clients is restricted. In addition, we provide a simple algorithm that approximates the optimal formula for client participation, which allows for secure aggregation and stateless clients, and thus does not compromise client privacy. We show both theoretically and empirically that for Distributed SGD (DSGD) and Federated Averaging (FedAvg), the performance of our approach can be close to full participation and superior to the baseline where participating clients are sampled uniformly. Moreover, our approach is orthogonal to and compatible with existing methods for reducing communication overhead, such as local methods and communication compression methods.

PDF Details

YNICL Journal 2019 Journal Article

18F-FDG-PET-based radiomics features to distinguish primary central nervous system lymphoma from glioblastoma

Ziren Kong
Chendan Jiang
Ruizhe Zhu
Shi Feng
Yaning Wang
Jiatong Li
Wenlin Chen
Penghao Liu

Details DOI

AAAI Conference 2015 Conference Paper

A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing

Quan Zhou
Wenlin Chen
Shiji Song
Jacob Gardner
Kilian Weinberger
Yixin Chen

Algorithmic reductions are one of the corner stones of theoretical computer science. Surprisingly, to-date, they have only played a limited role in machine learning. In this paper we introduce a formal and practical reduction between two of the most widely used machine learning algorithms: from the Elastic Net (and the Lasso as a special case) to the Support Vector Machine. First, we derive the reduction and summarize it in only 11 lines of MATLABTM. Then, we demonstrate its high impact potential by translating recent advances in parallelizing SVM solvers directly to the Elastic Net. The resulting algorithm is a parallel solver for the Elastic Net (and Lasso) that naturally utilizes GPU and multi-core CPUs. We evaluate it on twelve real world data sets, and show that it yields identical results as the popular (and highly optimized) glmnet implementation but is up-to two orders of magnitude faster.

PDF Details

ICML Conference 2015 Conference Paper

Compressing Neural Networks with the Hashing Trick

Wenlin Chen
James T. Wilson
Stephen Tyree
Kilian Q. Weinberger
Yixin Chen 0001

As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. These parameters are tuned to adjust to the HashedNets weight sharing architecture with standard backprop during training. Our hashing procedure introduces no additional memory overhead, and we demonstrate on several benchmark data sets that HashedNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance.

Details

NeurIPS Conference 2015 Conference Paper

Fast Distributed k-Center Clustering with Outliers on Massive Data

Gustavo Malkomes
Matt Kusner
Wenlin Chen
Kilian Weinberger
Benjamin Moseley

Clustering large data is a fundamental problem with a vast number of applications. Due to the increasing size of data, practitioners interested in clustering have turned to distributed computation methods. In this work, we consider the widely used k-center clustering problem and its variant used to handle noisy data, k-center with outliers. In the noise-free setting we demonstrate how a previously-proposed distributed method is actually an O(1)-approximation algorithm, which accurately explains its strong empirical performance. Additionally, in the noisy setting, we develop a novel distributed algorithm that is also an O(1)-approximation. These algorithms are highly parallel and lend themselves to virtually any distributed computing framework. We compare both empirically against the best known noisy sequential clustering methods and show that both distributed algorithms are consistently close to their sequential versions. The algorithms are all one can hope for in distributed settings: they are fast, memory efficient and they match their sequential counterparts.

PDF Details

AAAI Conference 2014 Conference Paper

Feature-Cost Sensitive Learning with Submodular Trees of Classifiers

Matt Kusner
Wenlin Chen
Quan Zhou
Zhixiang (Eddie) Xu
Kilian Weinberger
Yixin Chen

During the past decade, machine learning algorithms have become commonplace in large-scale real-world industrial applications. In these settings, the computation time to train and test machine learning algorithms is a key consideration. At training-time the algorithms must scale to very large data set sizes. At testing-time, the cost of feature extraction can dominate the CPU runtime. Recently, a promising method was proposed to account for the feature extraction cost at testing time, called Cost-sensitive Tree of Classifiers (CSTC). Although the CSTC problem is NP-hard, the authors suggest an approximation through a mixed-norm relaxation across many classifiers. This relaxation is slow to train and requires involved optimization hyperparameter tuning. We propose a different relaxation using approximate submodularity, called Approximately Submodular Tree of Classifiers (ASTC). ASTC is much simpler to implement, yields equivalent results but requires no optimization hyperparameter tuning and is up to two orders of magnitude faster to train.

PDF Details

AAAI Conference 2013 Conference Paper

Goal-Oriented Euclidean Heuristics with Manifold Learning

Wenlin Chen
Yixin Chen
Kilian Weinberger
Qiang Lu
Xiaoping Chen

Recently, a Euclidean heuristic (EH) has been proposed for A* search. EH exploits manifold learning methods to construct an embedding of the state space graph, and derives an admissible heuristic distance between two states from the Euclidean distance between their respective embedded points. EH has shown good performance and memory efficiency in comparison to other existing heuristics such as differential heuristics. However, its potential has not been fully explored. In this paper, we propose a number of techniques that can significantly improve the quality of EH. We propose a goal-oriented manifold learning scheme that optimizes the Euclidean distance to goals in the embedding while maintaining admissibility and consistency. We also propose a state heuristic enhancement technique to reduce the gap between heuristic and true distances. The enhanced heuristic is admissible but no longer consistent. We then employ a modified search algorithm, known as B0 algorithm, that achieves optimality with inconsistent heuristics using consistency check and propagation. We demonstrate the effectiveness of the above techniques and report un-matched reduction in search costs across several non-trivial benchmark search problems.

PDF Details

ICML Conference 2013 Conference Paper

Maximum Variance Correction with Application to A* Search

Wenlin Chen
Kilian Q. Weinberger
Yixin Chen 0001

In this paper we introduce Maximum Variance Correction (MVC), which finds large-scale feasible solutions to Maximum Variance Unfolding (MVU) by post-processing embeddings from any manifold learning algorithm. It increases the scale of MVU embeddings by several orders of magnitude and is naturally parallel. This unprecedented scalability opens up new avenues of applications for manifold learning, in particular the use of MVU embeddings as effective heuristics to speed-up A* search (Rayner et al. 2011). We demonstrate that MVC embeddings lead to un-matched reductions in search time across several non-trivial A* benchmark search problems and bridge the gap between the manifold learning literature and one of its most promising high impact applications.

Details