Arrow Research search

Author name cluster

Shandian Zhe

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers
2 author rows

Possible papers

47

AAAI Conference 2026 Conference Paper

ElastoGen: 4D Generative Elastodynamics

  • Yutao Feng
  • Yintong Shang
  • Xiang Feng
  • Lei Lan
  • Shandian Zhe
  • Tianjia Shao
  • Hongzhi Wu
  • Kun Zhou

We present ElastoGen, a knowledge-driven AI model that generates physically accurate 4D elastodynamics. Unlike deep models that learn from video- or image-based observations, ElastoGen leverages the principles of physics and learns from established mathematical and optimization procedures. The core idea of ElastoGen is converting the differential equation, corresponding to the nonlinear force equilibrium, into a series of iterative local convolution-like operations, which naturally fit deep architectures. We carefully build our network module following this overarching design philosophy. ElastoGen is much more lightweight in terms of both training requirements and network scale than deep generative models. Because of its alignment with actual physical procedures, ElastoGen efficiently generates accurate dynamics for a wide range of hyperelastic materials and can be easily integrated with upstream and downstream deep modules to enable end-to-end 4D generation.

TMLR Journal 2026 Journal Article

Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning

  • Matthew Lowery
  • John Turnage
  • Zachary Morrow
  • John Davis Jakeman
  • Akil Narayan
  • Shandian Zhe
  • Varun Shankar

This paper introduces the Kernel Neural Operator (KNO), a provably convergent operator-learning architecture that utilizes compositions of deep kernel-based integral operators for function-space approximation of operators (maps from functions to functions). The KNO decouples the choice of kernel from the numerical integration scheme (quadrature), thereby naturally allowing for operator learning with explicitly-chosen trainable kernels on irregular geometries. On irregular domains, this allows the KNO to utilize domain-specific quadrature rules. To help ameliorate the curse of dimensionality, we also leverage an efficient dimension-wise factorization algorithm on regular domains. More importantly, the ability to explicitly specify kernels also allows the use of highly expressive, non-stationary, neural anisotropic kernels whose parameters are computed by training neural networks. We present universal approximation theorems showing that both the continuous and fully discretized KNO are universal approximators on operator learning problems. Numerical results demonstrate that on existing benchmarks the training and test accuracy of KNOs is comparable to or higher than popular operator learning techniques while typically using an order of magnitude fewer trainable parameters, with the more expressive kernels proving important to attaining high accuracy. KNOs thus facilitate low-memory, geometrically-flexible, deep operator learning, while retaining the implementation simplicity and transparency of traditional kernel methods from both scientific computing and machine learning.

TMLR Journal 2026 Journal Article

StFT: Spatio-temporal Fourier Transformer for Long-term Dynamics Prediction

  • Da Long
  • Shandian Zhe
  • Samuel Williams
  • Leonid Oliker
  • Zhe Bai

Simulating the long-term dynamics of multi-scale and multi-physics systems poses a significant challenge in understanding complex phenomena across science and engineering. The complexity arises from the intricate interactions between scales and the interplay of diverse physical processes, which manifest in PDEs through coupled, nonlinear terms that govern the evolution of multiple physical fields across scales. Neural operators have shown potential in short-term prediction of such complex spatio-temporal dynamics; however, achieving stable high-fidelity predictions and providing robust uncertainty quantification over extended time horizons remains an open and unsolved area of research. These limitations often lead to stability degradation with rapid error accumulation, particularly in long-term forecasting of systems characterized by multi-scale behaviors involving dynamics of different orders. To address these challenges, we propose an autoregressive Spatio-temporal Fourier Transformer (StFT), in which each transformer block is designed to learn the system dynamics at a distinct scale through a dual-path architecture that integrates frequency-domain and spatio-temporal representations. By leveraging a structured hierarchy of StFT blocks, the resulting model explicitly captures the underlying dynamics across both macro- and micro- spatial scales. Furthermore, a generative residual correction mechanism is introduced to learn a probabilistic refinement temporally while simultaneously quantifying prediction uncertainties, enhancing both the accuracy and reliability of long-term probabilistic forecasting. Evaluations conducted on three benchmark datasets (plasma, fluid, and atmospheric dynamics) demonstrate the advantages of our approach over state-of-the-art ML methods.

ICML Conference 2025 Conference Paper

Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation

  • Da Long
  • Zhitong Xu
  • Guang Yang
  • Akil Narayan 0001
  • Shandian Zhe

Modern physics simulation often involves multiple functions of interests, and traditional numerical approaches are known to be complex and computationally costly. While machine learning-based surrogate models can offer significant cost reductions, most focus on a single task, such as forward prediction, and typically lack uncertainty quantification — an essential component in many applications. To overcome these limitations, we propose Arbitrarily-Conditioned Multi-Functional Diffusion (ACM-FD), a versatile probabilistic surrogate model for multi-physics emulation. ACM-FD can perform a wide range of tasks within a single framework, including forward prediction, various inverse problems, and simulating data for entire systems or subsets of quantities conditioned on others. Specifically, we extend the standard Denoising Diffusion Probabilistic Model (DDPM) for multi-functional generation by modeling noise as Gaussian processes (GP). We propose a random-mask based, zero-regularized denoising loss to achieve flexible and robust conditional generation. We induce a Kronecker product structure in the GP covariance matrix, substantially reducing the computational cost and enabling efficient training and sampling. We demonstrate the effectiveness of ACM-FD across several fundamental multi-physics systems.

TMLR Journal 2025 Journal Article

Fourier PINNs: From Strong Boundary Conditions to Adaptive Fourier Bases

  • Madison Cooley
  • Varun Shankar
  • Mike Kirby
  • Shandian Zhe

Interest is rising in Physics-Informed Neural Networks (PINNs) as a mesh-free alternative to traditional numerical solvers for partial differential equations (PDEs). However, PINNs often struggle to learn high-frequency and multi-scale target solutions. To tackle this problem, we first study a strong Boundary Condition (BC) version of PINNs for Dirichlet BCs and observe a consistent decline in relative error compared to the standard PINNs. We then perform a theoretical analysis based on the Fourier transform and convolution theorem. We find that strong BC PINNs can better learn the amplitudes of high-frequency components of the target solutions. However, constructing the architecture for strong BC PINNs is difficult for many BCs and domain geometries. Enlightened by our theoretical analysis, we propose Fourier PINNs --- a simple, general, yet powerful method that augments PINNs with pre-specified, dense Fourier bases. Our proposed architecture likewise learns high-frequency components better but places no restrictions on the particular BCs or problem domains. We develop an adaptive learning and basis selection algorithm via alternating neural net basis optimization, Fourier and neural net basis coefficient estimation, and coefficient truncation. This scheme can flexibly identify the significant frequencies while weakening the nominal frequencies to better capture the target solution's power spectrum. We show the advantage of our approach through a set of systematic experiments.

TMLR Journal 2025 Journal Article

HyResPINNs: A Hybrid Residual Physics-Informed Neural Network Architecture Designed to Balance Expressiveness and Trainability

  • Madison Cooley
  • Mike Kirby
  • Shandian Zhe
  • Varun Shankar

Physics-informed neural networks (PINNs) have emerged as a powerful approach for solving partial differential equations (PDEs) by training neural networks with loss functions that incorporate physical constraints. In this work, we introduce HyResPINNs, a two-level convex-gated architecture designed to maximize approximation expressiveness for a fixed number of degrees of freedom (DoF). The first level involves a trainable, per-block combination of smooth basis functions with trainable sparsity, and deep neural networks; the second involves the ability to gate entire blocks (much like in ResNets or Highway Nets), allowing for expressivity along the depth dimension of the architecture. Our empirical evaluation on a diverse set of challenging PDE problems demonstrates that HyResPINNs consistently achieve superior accuracy to baseline methods while remaining competitive relative to training times. These results highlight the potential of HyResPINNs to combine desirable features from traditional scientific computing methods and modern machine learning, paving the way for more robust and expressive approaches to physics-informed modeling.

TMLR Journal 2025 Journal Article

Pseudo-Physics-Informed Neural Operators: Enhancing Operator Learning from Limited Data

  • Keyan Chen
  • Yile Li
  • Da Long
  • Zhitong Xu
  • WEI W. XING
  • Jacob Hochhalter
  • Shandian Zhe

Neural operators have shown great potential in surrogate modeling. However, training a well-performing neural operator typically requires a substantial amount of data, which can pose a major challenge in complex applications. In such scenarios, detailed physical knowledge can be unavailable or difficult to obtain, and collecting extensive data is often prohibitively expensive. To mitigate this challenge, we propose the Pseudo Physics-Informed Neural Operator (PPI-NO) framework. PPI-NO constructs a surrogate physics system for the target system using partial differential equations (PDEs) derived from simple, rudimentary physics principles, such as basic differential operators. This surrogate system is coupled with a neural operator model, using an alternating update and learning process to iteratively enhance the model's predictive power. While the physics derived via PPI-NO may not mirror the ground-truth underlying physical laws --- hence the term ``pseudo physics'' --- this approach significantly improves the accuracy of standard operator learning models in data-scarce scenarios, which is evidenced by extensive evaluations across five benchmark tasks and a fatigue modeling application.

ICLR Conference 2025 Conference Paper

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

  • Zhitong Xu
  • Haitao Wang 0001
  • Jeff M. Phillips
  • Shandian Zhe

A long-standing belief holds that Bayesian Optimization (BO) with standard Gaussian processes (GP) --- referred to as standard BO --- underperforms in high-dimensional optimization problems. While this belief seems plausible, it lacks both robust empirical evidence and theoretical justification. To address this gap, we present a systematic investigation. First, through a comprehensive evaluation across twelve benchmarks, we found that while the popular Square Exponential (SE) kernel often leads to poor performance, using Mat\'ern kernels enables standard BO to consistently achieve top-tier results, frequently surpassing methods specifically designed for high-dimensional optimization. Second, our theoretical analysis reveals that the SE kernel’s failure primarily stems from improper initialization of the length-scale parameters, which are commonly used in practice but can cause gradient vanishing in training. We provide a probabilistic bound to characterize this issue, showing that Mat\'ern kernels are less susceptible and can robustly handle much higher dimensions. Third, we propose a simple robust initialization strategy that dramatically improves the performance of the SE kernel, bringing it close to state-of-the-art methods, without requiring additional priors or regularization. We prove another probabilistic bound that demonstrates how the gradient vanishing issue can be effectively mitigated with our method. Our findings advocate for a re-evaluation of standard BO’s potential in high-dimensional settings.

ICML Conference 2025 Conference Paper

Toward Efficient Kernel-Based Solvers for Nonlinear PDEs

  • Zhitong Xu
  • Da Long
  • Yiming Xu
  • Guang Yang
  • Shandian Zhe
  • Houman Owhadi

We introduce a novel kernel learning framework toward efficiently solving nonlinear partial differential equations (PDEs). In contrast to the state-of-the-art kernel solver that embeds differential operators within kernels, posing challenges with a large number of collocation points, our approach eliminates these operators from the kernel. We model the solution using a standard kernel interpolation form and differentiate the interpolant to compute the derivatives. Our framework obviates the need for complex Gram matrix construction between solutions and their derivatives, allowing for a straightforward implementation and scalable computation. As an instance, we allocate the collocation points on a grid and adopt a product kernel, which yields a Kronecker product structure in the interpolation. This structure enables us to avoid computing the full Gram matrix, reducing costs and scaling efficiently to a large number of collocation points. We provide a proof of the convergence and rate analysis of our method under appropriate regularity assumptions. In numerical experiments, we demonstrate the advantages of our method in solving several benchmark PDEs.

ICML Conference 2024 Conference Paper

BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition

  • Shikai Fang
  • Qingsong Wen
  • Yingtao Luo
  • Shandian Zhe
  • Liang Sun 0001

In real-world scenarios such as traffic and energy management, we frequently encounter large volumes of time-series data characterized by missing values, noise, and irregular sampling patterns. While numerous imputation methods have been proposed, the majority tend to operate within a local horizon, which involves dividing long sequences into batches of fixed-length segments for model training. This local horizon often leads to the overlooking of global trends and periodic patterns. More importantly, most methods assume the observations are sampled at regular timestamps, and fail to handle complex irregular sampled time series in various applications. Additionally, most existing methods are learned in an offline manner. Thus, it is not suitable for applications with rapidly arriving streaming data. To address these challenges, we propose BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition. Our method conceptualizes multivariate time series as the weighted combination of groups of low-rank temporal factors with different patterns. We employ a suite of Gaussian Processes (GPs), each with a unique kernel, as functional priors to model these factors. For computational efficiency, we further convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE), and developing a scalable algorithm for online inference. The proposed method can not only handle imputation over arbitrary timestamps, but also offer uncertainty quantification and interpretability for the downstream application. We evaluate our method on both synthetic and real-world datasets. We release the code at https: //github. com/xuangu-fang/BayOTIDE.

ICLR Conference 2024 Conference Paper

Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data

  • Shikai Fang
  • Xin Yu 0003
  • Zheng Wang 0042
  • Shibo Li
  • Robert M. Kirby
  • Shandian Zhe

Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there are finite objects in each aspect or mode, corresponding to discrete indexes of data entries. However, real-world data is often not naturally posed in this setting. For example, geographic data is represented as continuous indexes of latitude and longitude coordinates, and cannot fit tensor models directly. To generalize Tucker decomposition to such scenarios, we propose Functional Bayesian Tucker Decomposition (FunBaT). We treat the continuous-indexed data as the interaction between the Tucker core and a group of latent functions. We use Gaussian processes (GP) as functional priors to model the latent functions. Then, we convert each GP into a state-space prior by constructing an equivalent stochastic differential equation (SDE) to reduce computational cost. An efficient inference algorithm is developed for scalable posterior approximation based on advanced message-passing techniques. The advantage of our method is shown in both synthetic data and several real-world applications. We release the code of FunBaT at {https://github.com/xuangu-fang/Functional-Bayesian-Tucker-Decomposition}.

ICLR Conference 2024 Conference Paper

Solving High Frequency and Multi-Scale PDEs with Gaussian Processes

  • Shikai Fang
  • Madison Cooley
  • Da Long
  • Shibo Li
  • Robert M. Kirby
  • Shandian Zhe

Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student $t$ mixture or Gaussian mixture. We apply the inverse Fourier transform to obtain the covariance function (by Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. Next, we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to promote computational efficiency and scalability, without low-rank approximations. We show the advantage of our method in systematic experiments. The code is released at {https://github.com/xuangu-fang/Gaussian-Process-Slover-for-High-Freq-PDE}.

NeurIPS Conference 2023 Conference Paper

Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes

  • Zheng Wang
  • Shikai Fang
  • Shibo Li
  • Shandian Zhe

Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE). We develop a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode. Specifically, based on the observed tensor entries, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities and use a neural network to construct a reaction process for each individual entity. In this way, our model can capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the entry value as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both simulation studies and real-world applications. The code is available at https: //github. com/wzhut/Dynamic-Tensor-Decomposition-via-Neural-Diffusion-Reaction-Processes.

ICML Conference 2023 Conference Paper

Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks

  • Shibo Li
  • Michael Penwarden
  • Yiming Xu 0009
  • Conor Tillinghast
  • Akil Narayan 0001
  • Robert M. Kirby
  • Shandian Zhe

Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.

ICML Conference 2023 Conference Paper

Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation

  • Yu Chen
  • Wei Deng 0002
  • Shikai Fang
  • Fengpei Li
  • Nicole Tianjiao Yang
  • Yikai Zhang 0003
  • Kashif Rasul
  • Shandian Zhe

The Schrödinger bridge problem (SBP) is gaining increasing attention in generative modeling and showing promising potential even in comparison with the score-based generative models (SGMs). SBP can be interpreted as an entropy-regularized optimal transport problem, which conducts projections onto every other marginal alternatingly. However, in practice, only approximated projections are accessible and their convergence is not well understood. To fill this gap, we present a first convergence analysis of the Schrödinger bridge algorithm based on approximated projections. As for its practical applications, we apply SBP to probabilistic time series imputation by generating missing values conditioned on observed data. We show that optimizing the transport cost improves the performance and the proposed algorithm achieves the state-of-the-art result in healthcare and environmental data while exhibiting the advantage of exploring both temporal and feature patterns in probabilistic time series imputation.

NeurIPS Conference 2023 Conference Paper

Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

  • Shikai Fang
  • Xin Yu
  • Shibo Li
  • Zheng Wang
  • Mike Kirby
  • Shandian Zhe

Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning (SFTL) for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications.

ICML Conference 2022 Conference Paper

AutoIP: A United Framework to Integrate Physics into Gaussian Processes

  • Da Long
  • Zheng Wang 0042
  • Aditi S. Krishnapriyan
  • Robert M. Kirby
  • Shandian Zhe
  • Michael W. Mahoney

Physical modeling is critical for many modern science and engineering applications. From a data science or machine learning perspective, where more domain-agnostic, data-driven models are pervasive, physical knowledge {—} often expressed as differential equations {—} is valuable in that it is complementary to data, and it can potentially help overcome issues such as data sparsity, noise, and inaccuracy. In this work, we propose a simple, yet powerful and general framework {—} AutoIP, for Automatically Incorporating Physics {—} that can integrate all kinds of differential equations into Gaussian Processes (GPs) to enhance prediction accuracy and uncertainty quantification. These equations can be linear or nonlinear, spatial, temporal, or spatio-temporal, complete or incomplete with unknown source terms, and so on. Based on kernel differentiation, we construct a GP prior to sample the values of the target function, equation related derivatives, and latent source functions, which are all jointly from a multivariate Gaussian distribution. The sampled values are fed to two likelihoods: one to fit the observations, and the other to conform to the equation. We use the whitening method to evade the strong dependency between the sampled function values and kernel parameters, and we develop a stochastic variational learning algorithm. AutoIP shows improvement upon vanilla GPs in both simulation and several real-world applications, even using rough, incomplete equations.

NeurIPS Conference 2022 Conference Paper

Batch Multi-Fidelity Active Learning with Budget Constraints

  • Shibo Li
  • Jeff M Phillips
  • Xin Yu
  • Robert Kirby
  • Shandian Zhe

Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e. g. , by running numerical solvers. The recent work (Li et al. , 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk of bringing in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near $(1 - 1/e)$-approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.

ICML Conference 2022 Conference Paper

Bayesian Continuous-Time Tucker Decomposition

  • Shikai Fang
  • Akil Narayan 0001
  • Robert M. Kirby
  • Shandian Zhe

Tensor decomposition is a dominant framework for multiway data analysis and prediction. Although practical data often contains timestamps for the observed entries, existing tensor decomposition approaches overlook or under-use this valuable time information. They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition. We model the tensor-core of the classical Tucker decomposition as a time-varying function, and place a Gaussian process prior to flexibly estimate all kinds of temporal dynamics. In this way, our model maintains the interpretability while is flexible enough to capture various complex temporal relationships between the tensor nodes. For efficient and high-quality posterior inference, we use the stochastic differential equation (SDE) representation of temporal GPs to build an equivalent state-space prior, which avoids huge kernel matrix computation and sparse/low-rank approximations. We then use Kalman filtering, RTS smoothing, and conditional moment matching to develop a scalable message passing inference algorithm. We show the advantage of our method in simulation and several real-world applications.

ICML Conference 2022 Conference Paper

Decomposing Temporal High-Order Interactions via Latent ODEs

  • Shibo Li
  • Robert M. Kirby
  • Shandian Zhe

High-order interactions between multiple objects are common in real-world applications. Although tensor decomposition is a popular framework for high-order interaction analysis and prediction, most methods cannot well exploit the valuable timestamp information in data. The existent methods either discard the timestamps or convert them into discrete steps or use over-simplistic decomposition models. As a result, these methods might not be capable enough of capturing complex, fine-grained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). We model the time-varying interaction result with a latent ODE. To capture the complex temporal dynamics, we use a neural network (NN) to learn the time derivative of the ODE state. We use the representation of the interaction objects to model the initial value of the ODE and to constitute a part of the NN input to compute the state. In this way, the temporal relationships of the participant objects can be estimated and encoded into their representations. For tractable and scalable inference, we use forward sensitivity analysis to efficiently compute the gradient of ODE state, based on which we use integral transform to develop a stochastic mini-batch learning algorithm. We demonstrate the advantage of our approach in simulation and four real-world applications.

NeurIPS Conference 2022 Conference Paper

Infinite-Fidelity Coregionalization for Physical Simulation

  • Shibo Li
  • Zheng Wang
  • Robert Kirby
  • Shandian Zhe

Multi-fidelity modeling and learning is important in physical simulation related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation yet still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the feasible fidelity choice is often infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within infinite, continuous fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities that are not covered by the training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior approximation to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics.

ICML Conference 2022 Conference Paper

Nonparametric Embeddings of Sparse High-Order Interaction Events

  • Zheng Wang 0042
  • Yiming Xu 0009
  • Conor Tillinghast
  • Shibo Li
  • Akil Narayan 0001
  • Shandian Zhe

High-order interaction events are common in real-world applications. Learning embeddings that encode the complex relationships of the participants from these events is of great importance in knowledge mining and predictive tasks. Despite the success of existing approaches, e. g. Poisson tensor factorization, they ignore the sparse structure underlying the data, namely the occurred interactions are far less than the possible interactions among all the participants. In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). We hybridize a sparse hypergraph (tensor) process and a matrix Gaussian process to capture both the asymptotic structural sparsity within the interactions and nonlinear temporal relationships between the participants. We prove strong asymptotic bounds (including both a lower and an upper bound ) of the sparse ratio, which reveals the asymptotic properties of the sampled structure. We use batch-normalization, stick-breaking construction and sparse variational GP approximations to develop an efficient, scalable model inference algorithm. We demonstrate the advantage of our approach in several real-world applications.

ICML Conference 2022 Conference Paper

Nonparametric Factor Trajectory Learning for Dynamic Tensor Decomposition

  • Zheng Wang 0042
  • Shandian Zhe

Tensor decomposition is a fundamental framework to analyze data that can be represented by multi-dimensional arrays. In practice, tensor data are often accompanied with temporal information, namely the time points when the entry values were generated. This information implies abundant, complex temporal variation patterns. However, current methods always assume the factor representations of the entities in each tensor mode are static, and never consider their temporal evolution. To fill this gap, we propose NONparametric FActor Trajectory learning for dynamic tensor decomposition (NONFAT). We place Gaussian process (GP) priors in the frequency domain and conduct inverse Fourier transform via Gauss-Laguerre quadrature to sample the trajectory functions. In this way, we can overcome data sparsity and obtain robust trajectory estimates across long time horizons. Given the trajectory values at specific time points, we use a second-level GP to sample the entry values and to capture the temporal relationship between the entities. For efficient and scalable inference, we leverage the matrix Gaussian structure in the model, introduce a matrix Gaussian posterior, and develop a nested sparse variational learning algorithm. We have shown the advantage of our method in several real-world applications.

ICML Conference 2022 Conference Paper

Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes

  • Conor Tillinghast
  • Zheng Wang 0042
  • Shandian Zhe

We propose a nonparametric factorization approach for sparsely observed tensors. The sparsity does not mean zero-valued entries are massive or dominated. Rather, it implies the observed entries are very few, and even fewer with the growth of the tensor; this is ubiquitous in practice. Compared with the existent works, our model not only leverages the structural information underlying the observed entry indices, but also provides extra interpretability and flexibility {—} it can simultaneously estimate a set of location factors about the intrinsic properties of the tensor nodes, and another set of sociability factors reflecting their extrovert activity in interacting with others; users are free to choose a trade-off between the two types of factors. Specifically, we use hierarchical Gamma processes and Poisson random measures to construct a tensor-valued process, which can freely sample the two types of factors to generate tensors and always guarantees an asymptotic sparsity. We then normalize the tensor process to obtain hierarchical Dirichlet processes to sample each observed entry index, and use a Gaussian process to sample the entry value as a nonlinear function of the factors, so as to capture both the sparse structure properties and complex node relationships. For efficient inference, we use Dirichlet process properties over finite sample partitions, density transformations, and random features to develop a stochastic variational estimation algorithm. We demonstrate the advantage of our method in several benchmark datasets.

NeurIPS Conference 2022 Conference Paper

Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm

  • Aidan Good
  • Jiaqi Lin
  • Xin Yu
  • Hannah Sieg
  • Mikey Fergurson
  • Shandian Zhe
  • Jerzy Wieczorek
  • Thiago Serra

Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Namely, that pruning makes recall relatively worse for a class with recall below accuracy and, conversely, that it makes recall relatively better for a class with recall above accuracy. In addition, we propose a new pruning algorithm aimed at attenuating such effect. Through statistical analysis, we have observed that intensification is less severe with our algorithm but nevertheless more pronounced with relatively more difficult tasks, less complex models, and higher pruning ratios. More surprisingly, we conversely observe a de-intensification effect with lower pruning ratios.

ICML Conference 2022 Conference Paper

The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

  • Xin Yu 0003
  • Thiago Serra
  • Srikumar Ramalingam
  • Shandian Zhe

Neural networks tend to achieve better accuracy with training if they are larger {—} even if the resulting models are overparameterized. Nevertheless, carefully removing such excess of parameters before, during, or after training may also produce models with similar or even improved accuracy. In many cases, that can be curiously achieved by heuristics as simple as removing a percentage of the weights with the smallest absolute value {—} even though absolute value is not a perfect proxy for weight relevance. With the premise that obtaining significantly better performance from pruning depends on accounting for the combined effect of removing multiple weights, we revisit one of the classic approaches for impact-based pruning: the Optimal Brain Surgeon (OBS). We propose a tractable heuristic for solving the combinatorial extension of OBS, in which we select weights for simultaneous removal, and we combine it with a single-pass systematic update of unpruned weights. Our selection method outperforms other methods for high sparsity, and the single-pass weight update is also advantageous if applied after those methods.

NeurIPS Conference 2021 Conference Paper

Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks

  • Shibo Li
  • Robert Kirby
  • Shandian Zhe

Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function, and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications.

UAI Conference 2021 Conference Paper

Bayesian streaming sparse Tucker decomposition

  • Shikai Fang
  • Robert M. Kirby
  • Shandian Zhe

Tucker decomposition is a classical tensor factorization model. Compared with the most widely used CP decomposition, the Tucker model is much more flexible and interpretable in that it accounts for every possible (multiplicative) interaction between the factors in different modes. However, this also brings in the risk of overfitting and computational challenges, especially in the case of fast streaming data. To address these issues, we develop BASS-Tucker, a BAyesian Streaming Sparse Tucker decomposition method. We place a spike-and-slab prior over the core tensor elements to automatically select meaningful factor interactions so as to prevent overfitting and to further enhance the interpretability. To enable efficient streaming factorization, we use conditional moment matching and Delta’s method to develop one-shot incremental update of the latent factors and core tensor upon receiving each streaming batch. Thereby, we avoid processing the data points one by one as in the standard assumed density filtering, which needs to update the core tensor for each point and is quite inefficient. We explicitly introduce and update a sparse prior approximation in the running posterior to fulfill effective sparse estimation in the streaming inference. We show the advantage of BASS-Tucker in several real-world applications.

NeurIPS Conference 2021 Conference Paper

Characterizing possible failure modes in physics-informed neural networks

  • Aditi Krishnapriyan
  • Amir Gholami
  • Shandian Zhe
  • Robert Kirby
  • Michael W. Mahoney

Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena for even slightly more complex problems. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves PDE-based differential operators, can introduce a number of subtle problems, including making the problem more ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture, but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization, and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task, rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training.

ICML Conference 2021 Conference Paper

Nonparametric Decomposition of Sparse Tensors

  • Conor Tillinghast
  • Shandian Zhe

Tensor decomposition is a powerful framework for multiway data analysis. Despite the success of existing approaches, they ignore the sparse nature of the tensor data in many real-world applications, explicitly or implicitly assuming dense tensors. To address this model misspecification and to exploit the sparse tensor structures, we propose Nonparametric dEcomposition of Sparse Tensors (\ours), which can capture both the sparse structure properties and complex relationships between the tensor nodes to enhance the embedding estimation. Specifically, we first use completely random measures to construct tensor-valued random processes. We prove that the entry growth is much slower than that of the corresponding tensor size, which implies sparsity. Given finite observations (\ie projections), we then propose two nonparametric decomposition models that couple Dirichlet processes and Gaussian processes to jointly sample the sparse entry indices and the entry values (the latter as a nonlinear mapping of the embeddings), so as to encode both the structure properties and nonlinear relationships of the tensor nodes into the embeddings. Finally, we use the stick-breaking construction and random Fourier features to develop a scalable, stochastic variational learning algorithm. We show the advantage of our approach in sparse tensor generation, and entry index and value prediction in several real-world applications.

NeurIPS Conference 2021 Conference Paper

Self-Adaptable Point Processes with Nonparametric Time Decays

  • Zhimeng Pan
  • Zheng Wang
  • Jeff M Phillips
  • Shandian Zhe

Many applications involve multi-type event data. Understanding the complex influences of the events on each other is critical to discover useful knowledge and to predict future events and their types. Existing methods either ignore or partially account for these influences. Recent works use recurrent neural networks to model the event rate. While being highly expressive, they couple all the temporal dependencies in a black-box and can hardly extract meaningful knowledge. More important, most methods assume an exponential time decay of the influence strength, which is over-simplified and can miss many important strength varying patterns. To overcome these limitations, we propose SPRITE, a $\underline{S}$elf-adaptable $\underline{P}$oint p$\underline{R}$ocess w$\underline{I}$th nonparametric $\underline{T}$ime d$\underline{E}$cays, which can decouple the influences between every pair of the events and capture various time decays of the influence strengths. Specifically, we use an embedding to represent each event type and model the event influence as an unknown function of the embeddings and time span. We derive a general construction that can cover all possible time decaying functions. By placing Gaussian process (GP) priors over the latent functions and using Gauss-Legendre quadrature to obtain the integral in the construction, we can flexibly estimate all kinds of time-decaying influences, without restricting to any specific form or imposing derivative constraints that bring learning difficulties. We then use weight space augmentation of GPs to develop an efficient stochastic variational learning algorithm. We show the advantages of our approach in both the ablation study and real-world applications.

ICML Conference 2021 Conference Paper

Streaming Bayesian Deep Tensor Factorization

  • Shikai Fang
  • Zheng Wang 0042
  • Zhimeng Pan
  • Ji Liu
  • Shandian Zhe

Despite the success of existing tensor factorization methods, most of them conduct a multilinear decomposition, and rarely exploit powerful modeling frameworks, like deep neural networks, to capture a variety of complicated interactions in data. More important, for highly expressive, deep factorization, we lack an effective approach to handle streaming data, which are ubiquitous in real-world applications. To address these issues, we propose SBTD, a Streaming Bayesian Deep Tensor factorization method. We first use Bayesian neural networks (NNs) to build a deep tensor factorization model. We assign a spike-and-slab prior over each NN weight to encourage sparsity and to prevent overfitting. We then use multivariate Delta’s method and moment matching to approximate the posterior of the NN output and calculate the running model evidence, based on which we develop an efficient streaming posterior inference algorithm in the assumed-density-filtering and expectation propagation framework. Our algorithm provides responsive incremental updates for the posterior of the latent factors and NN weights upon receiving newly observed tensor entries, and meanwhile identify and inhibit redundant/useless weights. We show the advantages of our approach in four real-world applications.

AAAI Conference 2020 Conference Paper

Infinite ShapeOdds: Nonparametric Bayesian Models for Shape Representations

  • Wei Xing
  • Shireen Elhabian
  • Robert Kirby
  • Ross T. Whitaker
  • Shandian Zhe

Learning compact representations for shapes (binary images) is important for many applications. Although neural network models are very powerful, they usually involve many parameters, require substantial tuning efforts and easily overfit small datasets, which are common in shape-related applications. The state-of-the-art approach, ShapeOdds, as a latent Gaussian model, can effectively prevent overfitting and is more robust. Nonetheless, it relies on a linear projection assumption and is incapable of capturing intrinsic nonlinear shape variations, hence may leading to inferior representations and structure discovery. To address these issues, we propose In- finite ShapeOdds (InfShapeOdds), a Bayesian nonparametric shape model, which is flexible enough to capture complex shape variations and discover hidden cluster structures, while still avoiding overfitting. Specifically, we use matrix Gaussian priors, nonlinear feature mappings and the kernel trick to generalize ShapeOdds to a shape-variate Gaussian process model, which can grasp various nonlinear correlations among the pixels within and across (different) shapes. To further discover the hidden structures in data, we place a Dirichlet process mixture (DPM) prior over the representations to jointly infer the cluster number and memberships. Finally, we exploit the Kronecker-product structure in our model to develop an efficient, truncated variational expectation-maximization algorithm for model estimation. On synthetic and real-world data, we show the advantage of our method in both representation learning and latent structure discovery.

NeurIPS Conference 2020 Conference Paper

Multi-Fidelity Bayesian Optimization via Deep Neural Networks

  • Shibo Li
  • Wei Xing
  • Robert Kirby
  • Shandian Zhe

Bayesian optimization (BO) is a popular framework for optimizing black-box functions. In many applications, the objective function can be evaluated at multiple fidelities to enable a trade-off between the cost and accuracy. To reduce the optimization cost, many multi-fidelity BO methods have been proposed. Despite their success, these methods either ignore or over-simplify the strong, complex correlations across the fidelities. While the acquisition function is therefore easy and convenient to calculate, these methods can be inefficient in estimating the objective function. To address this issue, we propose Deep Neural Network Multi-Fidelity Bayesian Optimization (DNN-MFBO) that can flexibly capture all kinds of complicated relationships between the fidelities to improve the objective function estimation and hence the optimization performance. We use sequential, fidelity-wise Gauss-Hermite quadrature and moment-matching to compute a mutual information-based acquisition function in a tractable and highly efficient way. We show the advantages of our method in both synthetic benchmark datasets and real-world applications in engineering design.

IJCAI Conference 2020 Conference Paper

Scalable Gaussian Process Regression Networks

  • Shibo Li
  • Wei Xing
  • Robert M. Kirby
  • Shandian Zhe

Gaussian process regression networks (GPRN) are powerful Bayesian models for multi-output regression, but their inference is intractable. To address this issue, existing methods use a fully factorized structure (or a mixture of such structures) over all the outputs and latent functions for posterior approximation, which, however, can miss the strong posterior dependencies among the latent variables and hurt the inference quality. In addition, the updates of the variational parameters are inefficient and can be prohibitively expensive for a large number of outputs. To overcome these limitations, we propose a scalable variational inference algorithm for GPRN, which not only captures the abundant posterior dependencies but also is much more efficient for massive outputs. We tensorize the output space and introduce tensor/matrix-normal variational posteriors to capture the posterior correlations and to reduce the parameters. We jointly optimize all the parameters and exploit the inherent Kronecker product structure in the variational model evidence lower bound to accelerate the computation. We demonstrate the advantages of our method in several real-world applications.

ICML Conference 2020 Conference Paper

Self-Modulating Nonparametric Event-Tensor Factorization

  • Zheng Wang 0042
  • Xinqi Chu
  • Shandian Zhe

Tensor factorization is a fundamental framework to analyze high-order interactions in data. Despite the success of the existing methods, the valuable temporal information are severely underused. The timestamps of the interactions are either ignored or discretized into crude steps. The recent work although formulates event-tensors to keep the timestamps in factorization and can capture mutual excitation effects among the interaction events, it overlooks another important type of temporal influence, inhibition. In addition, it uses a local window to exclude all the long-term dependencies. To overcome these limitations, we propose a self-modulating nonparametric Bayesian factorization model. We use the latent factors to construct mutually governed, general random point processes, which can capture various short-term/long-term, excitation/inhibition effects, so as to encode the complex temporal dependencies into factor representations. In addition, our model couples with a latent Gaussian process to estimate and fuse nonlinear yet static relationships between the entities. For efficient inference, we derive a fully decomposed model evidence lower bound to dispense with the huge kernel matrix and costly summations inside the rate and log rate functions. We then develop an efficient stochastic optimization algorithm. We show the advantage of our method in four real-world applications.

UAI Conference 2020 Conference Paper

Streaming Nonlinear Bayesian Tensor Decomposition

  • Zhimeng Pan
  • Zheng Wang 0042
  • Shandian Zhe

Despite the success of the recent nonlinear tensor decomposition models based on Gaussian processes (GPs), they lack an effective way to deal with streaming data, which are important for many applications. Using the standard streaming variational Bayes framework or the recent streaming sparse GP approximations will lead to intractable model evidence lower bounds; although we can use stochastic gradient descent for incremental updates, they are unreliable and often yield poor estimations. To address this problem, we propose Streaming Nonlinear Bayesian Tensor Decomposition (SNBTD) that can conduct high-quality, closed-form and iteration-free updates upon receiving new tensor entries. Specifically, we use random Fourier features to build a sparse spectrum GP decomposition model to dispense with complex kernel/matrix operations and to ease posterior inference. We then extend the assumed-density-filtering framework by approximating all the data likelihoods in a streaming batch with a single factor to perform one-shot updates. We use conditional moment matching and Taylor approximations to fulfill efficient, analytical factor calculation. We show the advantage of our method on four real-world applications.

UAI Conference 2019 Conference Paper

Conditional Expectation Propagation

  • Zheng Wang 0042
  • Shandian Zhe

Expectation propagation (EP) is a powerful approximate inference algorithm. However, a critical barrier in applying EP is that the moment matching in message updates can be intractable. Handcrafting approximations is usually tricky, and lacks generalizability. Importance sampling is very expensive. While Laplace propagation offers an excellent solution, it has to run numerical optimizations to find Laplace approximations in every update, which is still quite inefficient. To overcome these practical barriers, we propose conditional expectation propagation (CEP) that performs conditional moment matching given the variables outside each message fixed, and then takes expectation w. r. t their approximate posterior. The conditional moments are often analytical and much easier to derive. In the most general case, we can use (fully) factorized messages so that the conditional moments can be represented by quadrature formulas. We then compute the expectation of the conditional moments via Taylor approximations when necessary. In this way, our algorithm can always conduct efficient, analytical fixed point iterations. Experiments on several popular models for which standard EP is available or unavailable demonstrate the advantages of CEP in both inference quality and computational efficiency.

NeurIPS Conference 2018 Conference Paper

Stochastic Nonparametric Event-Tensor Decomposition

  • Shandian Zhe
  • Yishuai Du

Tensor decompositions are fundamental tools for multiway data analysis. Existing approaches, however, ignore the valuable temporal information along with data, or simply discretize them into time steps so that important temporal patterns are easily missed. Moreover, most methods are limited to multilinear decomposition forms, and hence are unable to capture intricate, nonlinear relationships in data. To address these issues, we formulate event-tensors, to preserve the complete temporal information for multiway data, and propose a novel Bayesian nonparametric decomposition model. Our model can (1) fully exploit the time stamps to capture the critical, causal/triggering effects between the interaction events, (2) flexibly estimate the complex relationships between the entities in tensor modes, and (3) uncover hidden structures from their temporal interactions. For scalable inference, we develop a doubly stochastic variational Expectation-Maximization algorithm to conduct an online decomposition. Evaluations on both synthetic and real-world datasets show that our model not only improves upon the predictive performance of existing methods, but also discovers interesting clusters underlying the data.

ICML Conference 2017 Conference Paper

Asynchronous Distributed Variational Gaussian Process for Regression

  • Hao Peng
  • Shandian Zhe
  • Xiao Zhang 0017
  • Yuan (Alan) Qi

Gaussian processes (GPs) are powerful non-parametric function estimators. However, their applications are largely limited by the expensive computational cost of the inference procedures. Existing stochastic or distributed synchronous variational inferences, although have alleviated this issue by scaling up GPs to millions of samples, are still far from satisfactory for real-world large applications, where the data sizes are often orders of magnitudes larger, say, billions. To solve this problem, we propose ADVGP, the first Asynchronous Distributed Variational Gaussian Process inference for regression, on the recent large-scale machine learning platform, PARAMETER SERVER. ADVGP uses a novel, flexible variational framework based on a weight space augmentation, and implements the highly efficient, asynchronous proximal gradient optimization. While maintaining comparable or better predictive performance, ADVGP greatly improves upon the efficiency of the existing variational methods. With ADVGP, we effortlessly scale up GP regression to a real-world application with billions of samples and demonstrate an excellent, superior prediction accuracy to the popular linear models.

AAAI Conference 2017 Short Paper

Scalable Nonparametric Tensor Analysis

  • Shandian Zhe

Multiway data, described by tensors, are common in real-world applications. For example, online advertising click logs can be represented by a three-mode tensor (user, advertisement, context). The analysis of tensors is closely related to many important applications, such as click-through-rate (CTR) prediction, anomaly detection and product recommendation. Despite the success of existing tensor analysis approaches, such as Tucker, CANDECOMP/PARAFAC and infinite Tucker decompositions, they are either not enough powerful to capture complex hidden relationships in data, or not scalable to handle real-world large data. In addition, they may suffer from the extreme sparsity in real data, i.e., when the portion of nonzero entries is extremely low; they lack of principled ways to discover other patterns Ñ such as an unknown number of latent clusters Ñ which are critical for data mining tasks such as anomaly detection and market targeting. To address these challenges, I used nonparametric Bayesian techniques, such as Gaussian processes (GP) and Dirichlet processes (DP), to model highly nonlinear interactions and to extract hidden patterns in tensors; I derived tractable variational evidence lower bounds, based on which I developed scalable, distributed or online approximate inference algorithms. Experiments on both simulation and real-world large data have demonstrated the effect of my propoaed approaches.

JAIR Journal 2016 Journal Article

Association Discovery and Diagnosis of Alzheimer’s Disease with Bayesian Multiview Learning

  • Zenglin Xu
  • Shandian Zhe
  • Yuan Qi
  • Peng Yu

The analysis and diagnosis of Alzheimer’s disease (AD) can be based on genetic variations, e.g., single nucleotide polymorphisms (SNPs) and phenotypic traits, e.g., Magnetic Resonance Imaging (MRI) features. We consider two important and related tasks: i) to select genetic and phenotypical markers for AD diagnosis and ii) to identify associations between genetic and phenotypical data. While previous studies treat these two tasks separately, they are tightly coupled because underlying associations between genetic variations and phenotypical features contain the biological basis for a disease. Here we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels; in return, the disease status can guide the discovery of relationships between data sources. The sparse projection matrices not only reveal interactions between data sources but also select groups of biomarkers related to the disease. Moreover, to take advantage of the linkage disequilibrium (LD) measuring the non-random association of alleles, we incorporate a graph Laplacian type of prior in the model. To learn the model from data, we develop an efficient variational inference algorithm. Analysis on an imaging genetics dataset for the study of Alzheimer’s Disease (AD) indicates that our model identifies biologically meaningful associations between genetic variations and MRI features, and achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods.

AAAI Conference 2016 Conference Paper

DinTucker: Scaling Up Gaussian Process Models on Large Multidimensional Arrays

  • Shandian Zhe
  • Yuan Qi
  • Youngja Park
  • Zenglin Xu
  • Ian Molloy
  • Suresh Chari

Tensor decomposition methods are effective tools for modelling multidimensional array data (i. e. , tensors). Among them, nonparametric Bayesian models, such as Infinite Tucker Decomposition (InfTucker), are more powerful than multilinear factorization approaches, including Tucker and PARAFAC, and usually achieve better predictive performance. However, they are difficult to handle massive data due to a prohibitively high training cost. To address this limitation, we propose Distributed infinite Tucker (DINTUCKER), a new hierarchical Bayesian model that enables local learning of InfTucker on subarrays and global information integration from local results. We further develop a distributed stochastic gradient descent algorithm, coupled with variational inference for model estimation. In addition, the connection between DINTUCKER and InfTucker is revealed in terms of model evidence. Experiments demonstrate that DINTUCKER maintains the predictive accuracy of InfTucker and is scalable on massive data: On multidimensional arrays with billions of elements from two real-world applications, DINTUCKER achieves significantly higher prediction accuracy with less training time, compared with the state-of-the-art large-scale tensor decomposition method, GigaTensor.

NeurIPS Conference 2016 Conference Paper

Distributed Flexible Nonlinear Tensor Factorization

  • Shandian Zhe
  • Kai Zhang
  • Pengyuan Wang
  • Kuang-chih Lee
  • Zenglin Xu
  • Yuan Qi
  • Zoubin Ghahramani

Tensor factorization is a powerful tool to analyse multi-way data. Recently proposed nonlinear factorization methods, although capable of capturing complex relationships, are computationally quite expensive and may suffer a severe learning bias in case of extreme data sparsity. Therefore, we propose a distributed, flexible nonlinear tensor factorization model, which avoids the expensive computations and structural restrictions of the Kronecker-product in the existing TGP formulations, allowing an arbitrary subset of tensor entries to be selected for training. Meanwhile, we derive a tractable and tight variational evidence lower bound (ELBO) that enables highly decoupled, parallel computations and high-quality inference. Based on the new bound, we develop a distributed, key-value-free inference algorithm in the MapReduce framework, which can fully exploit the memory cache mechanism in fast MapReduce systems such as Spark. Experiments demonstrate the advantages of our method over several state-of-the-art approaches, in terms of both predictive performance and computational efficiency.

IJCAI Conference 2016 Conference Paper

Fast Laplace Approximation for Sparse Bayesian Spike and Slab Models

  • Syed Abbas Z. Naqvi
  • Shandian Zhe
  • Yuan Qi
  • Yifan Yang
  • Jieping Ye

We consider the application of Bayesian spike-and-slab models in high-dimensional feature selection problems. To do so, we propose a simple yet effective fast approximate Bayesian inference algorithm based on Laplace's method. We exploit two efficient optimization methods, GIST and L-BFGS, to obtain the mode of the posterior distribution. Then we propose an ensemble Nystrom based approach to calculate the diagonal of the inverse Hessian over the mode to obtain the approximate posterior marginals in O(knp) time, k ≪ p. Furthermore, we provide the theoretical analysis about the estimation consistency and approximation error bounds. With the posterior marginals of the model weights, we use quadrature integration to estimate the marginal posteriors of selection probabilities and indicator variables for all features, which quantify the selection uncertainty. Our method not only maintains the benefits of the Bayesian treatment (e. g. , uncertainty quantification) but also possesses the computational efficiency, and oracle properties of the frequentist methods. Simulation shows that our method estimates better or comparable selection probabilities and indicator variables than alternative approximate inference methods such as VB and EP, but with less running time. Extensive experiments on large real datasets demonstrate that our method often improves prediction accuracy over Bayesian automatic relevance determination, EP, and frequentist L1 type methods.

AAAI Conference 2015 Conference Paper

Bayesian Maximum Margin Principal Component Analysis

  • Changying Du
  • Shandian Zhe
  • Fuzhen Zhuang
  • Yuan Qi
  • Qing He
  • Zhongzhi Shi

Supervised dimensionality reduction has shown great advantages in finding predictive subspaces. Previous methods rarely consider the popular maximum margin principle and are prone to overfitting to usually small training data, especially for those under the maximum likelihood framework. In this paper, we present a posterior-regularized Bayesian approach to combine Principal Component Analysis (PCA) with the maxmargin learning. Based on the data augmentation idea for max-margin learning and the probabilistic interpretation of PCA, our method can automatically infer the weight and penalty parameter of max-margin learning machine, while finding the most appropriate PCA subspace simultaneously under the Bayesian framework. We develop a fast mean-field variational inference algorithm to approximate the posterior. Experimental results on various classification tasks show that our method outperforms a number of competitors.

AAAI Conference 2015 Conference Paper

Sparse Bayesian Multiview Learning for Simultaneous Association Discovery and Diagnosis of Alzheimer’s Disease

  • Shandian Zhe
  • Zenglin Xu
  • Yuan Qi
  • Peng Yu

In the analysis and diagnosis of many diseases, such as the Alzheimer’s disease (AD), two important and related tasks are usually required: i) selecting genetic and phenotypical markers for diagnosis, and ii) identifying associations between genetic and phenotypical features. While previous studies treat these two tasks separately, they are tightly coupled due to the same underlying biological basis. To harness their potential benefits for each other, we propose a new sparse Bayesian approach to jointly carry out the two important and related tasks. In our approach, we extract common latent features from different data sources by sparse projection matrices and then use the latent features to predict disease severity levels; in return, the disease status can guide the learning of sparse projection matrices, which not only reveal interactions between data sources but also select groups of related biomarkers. In order to boost the learning of sparse projection matrices, we further incorporate graph Laplacian priors encoding the valuable linkage disequilibrium (LD) information. To efficiently estimate the model, we develop a variational inference algorithm. Analysis on an imaging genetics dataset for AD study shows that our model discovers biologically meaningful associations between single nucleotide polymorphisms (SNPs) and magnetic resonance imaging (MRI) features, and achieves significantly higher accuracy for predicting ordinal AD stages than competitive methods.