Author name cluster

Jinbo Bi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers

2 author rows

AAAI Conference 2026 Conference Paper

VEDA: Generation of 3D Molecules via Variance-Exploding Diffusion with Annealing

Peining Zhang
Jinbo Bi
Minghu Song

Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they often produce geometrically inaccurate structures, as they have difficulty capturing the multimodal distributions of molecular conformations. In contrast, denoising diffusion models are more accurate but suffer from slow sampling, a limitation attributed to sub-optimal integration between diffusion dynamics and SE(3)‑equivariant architectures. To address this, we propose VEDA, a unified SE(3)-equivariant framework that combines variance-exploding diffusion with annealing to efficiently generate conformationally accurate 3D molecular structures. Specifically, our key technical contributions include: (1) a VE schedule that enables noise injection functionally analogous to simulated annealing, improving 3D accuracy and reducing relaxation energy; (2) a novel preconditioning scheme that reconciles the coordinate-predicting nature of SE(3)-equivariant networks with a residual-based diffusion objective, and (3) a new arcsin-based scheduler that concentrates sampling in critical intervals of the logarithmic signal-to-noise ratio. On the QM9 and GEOM-DRUGS datasets, VEDA matches the sampling efficiency of flow-based models, achieving state-of-the-art valency stability and validity with only 100 sampling steps. More importantly, VEDA's generated structures are remarkably stable, as measured by their relaxation energy (Delta E_relax) during GFN2-xTB optimization. The median energy change is only 1.72kcal/mol, significantly lower than the 32.3kcal/mol from its architectural baseline, SemlaFlow. Our framework demonstrates that principled integration of VE diffusion with SE(3)-equivariant architectures can achieve both high chemical accuracy and computational efficiency.

PDF Details DOI

TIST Journal 2025 Journal Article

Cross-platform Prediction of Depression Treatment Outcome Using Location Sensory Data on Smartphones

Soumyashree Sahoo
Md. Zakir Hossain
Chinmaey Shende
Parit Patel
Yushuo Niu
Reynaldo Morillo
Xinyu Wang
Shweta Ware

ABSTRACT Currently, depression treatment relies on closely monitoring patients’ response to treatment and adjusting the treatment as needed. Using self-reported or physician-administrated questionnaires to monitor treatment response is, however, subjective, costly and suffers from recall bias. In this paper, we explore using location sensory data collected passively on smartphones to predict treatment outcome. To address heterogeneous data collection on Android and iOS phones, the two predominant smartphone platforms, we explore using domain adaptation techniques to map their data to a common feature space, and then use the data jointly to train machine learning models. We further explore integrating contrastive learning with domain adaptation to augment data and learn feature embeddings. These learned embeddings are then used to train machine learning models to predict depression treatment outcomes. Our evaluation shows that using the embeddings learned by jointly integrating contrastive learning and domain adaptation leads to the best prediction accuracy. In addition, our results show that using location features and baseline self-reported questionnaire score can lead to F1 score up to 0.76. This accuracy is comparable to that obtained using periodic self-reported questionnaires, indicating that using location data is a promising direction for predicting depression treatment outcome. Last, when all location and questionnaire data are used together, the F1 score further increases to 0.79.

Details DOI

ICLR Conference 2023 Conference Paper

Auto-Encoding Goodness of Fit

Aaron Palmer
Zhiyi Chi
Derek Aguiar
Jinbo Bi

For generative autoencoders to learn a meaningful latent representation for data generation, a careful balance must be achieved between reconstruction error and how close the distribution in the latent space is to the prior. However, this balance is challenging to achieve due to a lack of criteria that work both at the mini-batch (local) and aggregated posterior (global) level. In this work, we develop the Goodness of Fit Autoencoder (GoFAE), which incorporates hypothesis tests at two levels. At the mini-batch level, it uses GoF test statistics as regularization objectives. At a more global level, it selects a regularization coefficient based on higher criticism, i.e., a test on the uniformity of the local GoF p-values. We justify the use of GoF tests by providing a relaxed $L_2$-Wasserstein bound on the distance between the latent distribution and target prior. We propose to use GoF tests and prove that optimization based on these tests can be done with stochastic gradient (SGD) descent on a compact Riemannian manifold. Empirically, we show that our higher criticism parameter selection procedure balances reconstruction and generation using mutual information and uniformity of p-values respectively. Finally, we show that GoFAE achieves comparable FID scores and mean squared errors with competing deep generative models while retaining statistical indistinguishability from Gaussian in the latent space based on a variety of hypothesis tests.

Details

IJCAI Conference 2023 Conference Paper

Customized Positional Encoding to Combine Static and Time-varying Data in Robust Representation Learning for Crop Yield Prediction

Qinqing Liu
Fei Dou
Meijian Yang
Ezana Amdework
Guiling Wang
Jinbo Bi

Accurate prediction of crop yield under the conditions of climate change is crucial to ensure food security. Transformers have shown remarkable success in modeling sequential data and hold the potential for improving crop yield prediction. To understand how weather and meteorological sequence variables affect crop yield, the positional encoding used in Transformers is typically shared across different sample sequences. We argue that it is necessary and beneficial to differentiate the positional encoding for distinct samples based on time-invariant properties of the sequences. Particularly, the sequence variables influencing crop yield vary according to static variables such as geographical locations. Sample data from southern areas may benefit from more tailored positional encoding different from that for northern areas. We propose a novel transformer based architecture for accurate and robust crop yield prediction, by introducing a Customized Positional Encoding (CPE) that encodes a sequence adaptively according to static information associated with the sequence. Empirical studies demonstrate the effectiveness of the proposed novel architecture and show that partially lin- earized attention better captures the bias introduced by side information than softmax re-weighting. The resultant crop yield prediction model is robust to climate change, with mean-absolute-error reduced by up to 26% compared to the best baseline model in extreme drought years.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Polyhedron Attention Module: Learning Adaptive-order Interactions

Tan Zhu
Fei Dou
Xinyu Wang
Jin Lu
Jinbo Bi

Learning feature interactions can be the key for multivariate predictive modeling. ReLU-activated neural networks create piecewise linear prediction models, and other nonlinear activation functions lead to models with only high-order feature interactions. Recent methods incorporate candidate polynomial terms of fixed orders into deep learning, which is subject to the issue of combinatorial explosion, or learn the orders that are difficult to adapt to different regions of the feature space. We propose a Polyhedron Attention Module (PAM) to create piecewise polynomial models where the input space is split into polyhedrons which define the different pieces and on each piece the hyperplanes that define the polyhedron boundary multiply to form the interactive terms, resulting in interactions of adaptive order to each piece. PAM is interpretable to identify important interactions in predicting a target. Theoretic analysis shows that PAM has stronger expression capability than ReLU-activated networks. Extensive experimental results demonstrate the superior classification performance of PAM on massive datasets of the click-through rate prediction and PAM can learn meaningful interaction effects in a medical problem.

PDF Details

IJCAI Conference 2021 Conference Paper

Against Membership Inference Attack: Pruning is All You Need

Yijue Wang
Chenghong Wang
Zigeng Wang
Shanglin Zhou
Hang Liu
Jinbo Bi
Caiwen Ding
Sanguthevar Rajasekaran

The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and computational operation. In this work, we propose a pruning algorithm, and we show that the proposed algorithm can find a subnetwork that can prevent privacy leakage from MIA and achieves competitive accuracy with the original DNNs. We also verify our theoretical insights with experiments. Our experimental results illustrate that the attack accuracy using model compression is up to 13. 6% and 10% lower than that of the baseline and Min-Max game, accordingly.

PDF Details DOI

AAAI Conference 2021 Conference Paper

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu
Guannan Liang
Chunjiang Zhu
Haining Li
Jinbo Bi

In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed context to maximize the cumulative reward over iterations. Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and the DNN is trained by a stochastic gradient based method. However, convergence analysis has been greatly ignored to examine whether and where these methods converge. In this work, we formulate the SCB that uses a DNN reward function as a non-convex stochastic optimization problem, and design a stage-wise stochastic gradient descent algorithm to optimize the problem and determine the action policy. We prove that with high probability, the action sequence chosen by this algorithm converges to a greedy action policy respecting a local optimal reward function. Extensive experiments have been performed to demonstrate the effectiveness and efficiency of the proposed algorithm on multiple real-world datasets.

PDF Details

AAAI Conference 2021 Conference Paper

Differentially Private and Communication Efficient Collaborative Learning

Jiahao Ding
Guannan Liang
Jinbo Bi
Miao Pan

Collaborative learning has received huge interests due to its capability of exploiting the collective computing power of the wireless edge devices. However, during the learning process, model updates using local private samples and large-scale parameter exchanges among agents impose severe privacy concerns and communication bottleneck. In this paper, to address these problems, we propose two differentially private (DP) and communication efficient algorithms, called Q-DPSGD-1 and Q-DPSGD-2. In Q-DPSGD-1, each agent first performs local model updates by a DP gradient descent method to provide the DP guarantee and then quantizes the local model before transmitting it to neighbors to improve communication efficiency. In Q-DPSGD-2, each agent injects discrete Gaussian noise to enforce DP guarantee after first quantizing the local model. Moreover, we track the privacy loss of both approaches under the Rényi DP and provide convergence analysis for both convex and non-convex loss functions. The proposed methods are evaluated in extensive experiments on real-world datasets and the empirical results validate our theoretical findings.

PDF Details

ICLR Conference 2021 Conference Paper

Discrete Graph Structure Learning for Forecasting Multiple Time Series

Chao Shang
Jie Chen 0007
Jinbo Bi

Time series forecasting is an extensively studied subject in statistics, economics, and computer science. Exploration of the correlation and causation among the variables in a multivariate time series shows promise in enhancing the performance of a time series model. When using deep neural networks as forecasting models, we hypothesize that exploiting the pairwise information among multiple (multivariate) time series also improves their forecast. If an explicit graph structure is known, graph neural networks (GNNs) have been demonstrated as powerful tools to exploit the structure. In this work, we propose learning the structure simultaneously with the GNN if the graph is unknown. We cast the problem as learning a probabilistic graph model through optimizing the mean performance over the graph distribution. The distribution is parameterized by a neural network so that discrete graphs can be sampled differentiably through reparameterization. Empirical evaluations show that our method is simpler, more efficient, and better performing than a recently proposed bilevel learning approach for graph structure learning, as well as a broad array of forecasting models, either deep or non-deep learning based, and graph or non-graph based.

Details

YNICL Journal 2021 Journal Article

Perceived stress, self-efficacy, and the cerebral morphometric markers in binge-drinking young adults

Guangfei Li
Thang M. Le
Wuyi Wang
Simon Zhornitsky
Yu Chen
Shefali Chaudhary
Tan Zhu
Sheng Zhang

Studies have identified cerebral morphometric markers of binge drinking and implicated cortical regions in support of self-efficacy and stress regulation. However, it remains unclear how cortical structures of self-control play a role in ameliorating stress and alcohol consumption or how chronic alcohol exposure alters self-control and leads to emotional distress. We examined the data of 180 binge (131 men) and 256 non-binge (83 men) drinkers from the Human Connectome Project. We obtained data on regional cortical thickness from the HCP and derived gray matter volumes (GMVs) with voxel-based morphometry. At a corrected threshold, binge relative to non-binge drinking men showed diminished posterior cingulate cortex (PCC) thickness and dorsomedial prefrontal cortex (dmPFC) GMV. PCC thickness and dmPFC GMVs were positively and negatively correlated with self-efficacy and perceived stress, respectively, as assessed with the NIH Emotion Toolbox. Mediation and path analyses to query the inter-relationships between the neural markers and clinical variables showed a best fit of the model with daily drinks → lower PCC thickness and dmPFC GMV → lower self-efficacy → higher perceived stress in men. In contrast, binge and non-binge drinking women did not show significant differences in regional cortical thickness or GMVs. These findings suggest a pathway whereby chronic alcohol consumption alters cortical structures and self-efficacy mediates the effects of cortical structural deficits on perceived stress in men. The findings also suggest the need to investigate multimodal neural markers underlying the interplay between stress, self-control and alcohol use behavior in women.

Details DOI

ICML Conference 2021 Conference Paper

Spectral vertex sparsifiers and pair-wise spanners over distributed graphs

Chun Jiang Zhu
Qinqing Liu
Jinbo Bi

Graph sparsification is a powerful tool to approximate an arbitrary graph and has been used in machine learning over graphs. As real-world networks are becoming very large and naturally distributed, distributed graph sparsification has drawn considerable attention. In this work, we design communication-efficient distributed algorithms for constructing spectral vertex sparsifiers, which closely preserve effective resistance distances on a subset of vertices of interest in the original graphs, under the well-established message passing communication model. We prove that the communication cost approximates the lower bound with only a small gap. We further provide algorithms for constructing pair-wise spanners which approximate the shortest distances between each pair of vertices in a target set, instead of all pairs, and incur communication costs that are much smaller than those of existing algorithms in the message passing model. Experiments are performed to validate the communication efficiency of the proposed algorithms under the guarantee that the constructed sparsifiers have a good approximation quality.

Details

AAAI Conference 2020 Conference Paper

An Effective Hard Thresholding Method Based on Stochastic Variance Reduction for Nonconvex Sparse Learning

Guannan Liang
Qianqian Tong
Chunjiang Zhu
Jinbo Bi

We propose a hard thresholding method based on stochastically controlled stochastic gradients (SCSG-HT) to solve a family of sparsity-constrained empirical risk minimization problems. The SCSG-HT uses batch gradients where batch size is pre-determined by the desirable precision tolerance rather than full gradients to reduce the variance in stochastic gradients. It also employs the geometric distribution to determine the number of loops per epoch. We prove that, similar to the latest methods based on stochastic gradient descent or stochastic variance reduction methods, SCSG-HT enjoys a linear convergence rate. However, SCSG-HT now has a strong guarantee to recover the optimal sparse estimator. The computational complexity of SCSG-HT is independent of sample size n when n is larger than 1, which enhances the scalability to massive-scale problems. Empirical results demonstrate that SCSG-HT outperforms several competitors and decreases the objective value the most with the same computational costs.

PDF Details

AAAI Conference 2019 Conference Paper

Communication-Optimal Distributed Dynamic Graph Clustering

Chun Jiang Zhu
Tan Zhu
Kam-Yiu Lam
Song Han
Jinbo Bi

We consider the problem of clustering graph nodes over large-scale dynamic graphs, such as citation networks, images and web networks, when graph updates such as node/edge insertions/deletions are observed distributively. We propose communication-efficient algorithms for two well-established communication models namely the message passing and the blackboard models. Given a graph with n nodes that is observed at s remote sites over time [1, t], the two proposed algorithms have communication costs Õ(ns) and Õ(n + s) (Õ hides a polylogarithmic factor), almost matching their lower bounds, Ω(ns) and Ω(n + s), respectively, in the message passing and the blackboard models. More importantly, we prove that at each time point in [1, t] our algorithms generate clustering quality nearly as good as that of centralizing all updates up to that time and then applying a standard centralized clustering algorithm. We conducted extensive experiments on both synthetic and real-life datasets which confirmed the communication efficiency of our approach over baseline algorithms while achieving comparable clustering results.

PDF Details

AAAI Conference 2019 Conference Paper

End-to-End Structure-Aware Convolutional Networks for Knowledge Base Completion

Chao Shang
Yun Tang
Jing Huang
Jinbo Bi
Xiaodong He
Bowen Zhou

Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to the current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features to model knowledge graphs. The model can be efficiently trained and scalable to large knowledge graphs. However, there is no structure enforcement in the embedding space of ConvE. The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure. In this work, we propose a novel end-to-end Structure- Aware Convolutional Network (SACN) that takes the benefit of GCN and ConvE together. SACN consists of an encoder of a weighted graph convolutional network (WGCN), and a decoder of a convolutional network called Conv-TransE. WGCN utilizes knowledge graph node structure, node attributes and edge relation types. It has learnable weights that adapt the amount of information from neighbors used in local aggregation, leading to more accurate embeddings of graph nodes. Node attributes in the graph are represented as additional nodes in the WGCN. The decoder Conv-TransE enables the state-of-the-art ConvE to be translational between entities and relations while keeps the same link prediction performance as ConvE. We demonstrate the effectiveness of the proposed SACN on standard FB15k-237 and WN18RR datasets, and it gives about 10% relative improvement over the state-of-theart ConvE in terms of HITS@1, HITS@3 and HITS@10.

PDF Details

ICML Conference 2019 Conference Paper

Improved Dynamic Graph Learning through Fault-Tolerant Sparsification

Chun Jiang Zhu
Sabine Storandt
Kam-yiu Lam
Song Han 0002
Jinbo Bi

Graph sparsification has been used to improve the computational cost of learning over graphs, e. g. , Laplacian-regularized estimation and graph semi-supervised learning (SSL). However, when graphs vary over time, repeated sparsification requires polynomial order computational cost per update. We propose a new type of graph sparsification namely fault-tolerant (FT) sparsification to significantly reduce the cost to only a constant. Then the computational cost of subsequent graph learning tasks can be significantly improved with limited loss in their accuracy. In particular, we give theoretical analyze to upper bound the loss in the accuracy of the subsequent Laplacian-regularized estimation and graph SSL, due to the FT sparsification. In addition, FT spectral sparsification can be generalized to FT cut sparsification, for cut-based graph learning. Extensive experiments have confirmed the computational efficiencies and accuracies of the proposed methods for learning on dynamic graphs.

Details

AAAI Conference 2018 Conference Paper

Latent Sparse Modeling of Longitudinal Multi-Dimensional Data

Ko-Shin Chen
Tingyang Xu
Jinbo Bi

We propose a tensor-based approach to analyze multidimensional data describing sample subjects. It simultaneously discovers patterns in features and reveals past temporal points that have impact on current outcomes. The model coefﬁcient, a k-mode tensor, is decomposed into a summation of k tensors of the same dimension. To accomplish feature selection, we introduce the tensor ‘latent LF, 1 norm’ as a grouped penalty in our formulation. Furthermore, the proposed model takes into account within-subject correlations by developing a tensor-based quadratic inference function. We provide an asymptotic analysis of our model when the sample size approaches to inﬁnity. To solve the corresponding optimization problem, we develop a linearized block coordinate descent algorithm and prove its convergence for a ﬁxed sample size. Computational results on synthetic datasets and real- ﬁle fMRI and EEG problems demonstrate the superior performance of the proposed approach over existing techniques.

PDF Details

UAI Conference 2018 Conference Paper

Reforming Generative Autoencoders via Goodness-of-Fit Hypothesis Testing

Aaron Palmer
Dipak K. Dey
Jinbo Bi

Generative models, while not new, have taken the deep learning field by storm. However, the widely used training methods have not exploited the substantial statistical literature concerning parametric distributional testing. Having sound theoretical foundations, these goodness-of-fit tests enable parts of the black box to be stripped away. In this paper we use the Shapiro-Wilk and propose a new multivariate generalization of Shapiro-Wilk to respectively test for univariate and multivariate normality of the code layer of a generative autoencoder. By replacing the discriminator in traditional deep models with the hypothesis tests, we gain several advantages: objectively evaluate whether the encoder is actually embedding data onto a normal manifold, accurately define when convergence happens, explicitly balance between reconstruction and encoding training. Not only does our method produce competitive results, but it does so in a fraction of the time. We highlight the fact that the hypothesis tests used in our model asymptotically lead to the same solution of the L2 -Wasserstein distance metrics used by several generative models today.

Details

NeurIPS Conference 2016 Conference Paper

A Sparse Interactive Model for Matrix Completion with Side Information

Jin Lu
Guannan Liang
Jiangwen Sun
Jinbo Bi

Matrix completion methods can benefit from side information besides the partially observed matrix. The use of side features describing the row and column entities of a matrix has been shown to reduce the sample complexity for completing the matrix. We propose a novel sparse formulation that explicitly models the interaction between the row and column side features to approximate the matrix entries. Unlike early methods, this model does not require the low-rank condition on the model parameter matrix. We prove that when the side features can span the latent feature space of the matrix to be recovered, the number of observed entries needed for an exact recovery is $O(\log N)$ where $N$ is the size of the matrix. When the side features are corrupted latent features of the matrix with a small perturbation, our method can achieve an $\epsilon$-recovery with $O(\log N)$ sample complexity, and maintains a $\O(N^{3/2})$ rate similar to classfic methods with no side information. An efficient linearized Lagrangian algorithm is developed with a strong guarantee of convergence. Empirical results show that our approach outperforms three state-of-the-art methods both in simulations and on real world datasets.

PDF Details

JMLR Journal 2016 Journal Article

Multiplicative Multitask Feature Learning

Xin Wang
Jinbo Bi
Shipeng Yu
Jiangwen Sun
Minghu Song

We investigate a general framework of multiplicative multitask feature learning which decomposes individual task's model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods can be proved to be special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the task- specific component for all these regularizers, leading to a better understanding of the shrinkage effects of different regularizers. Study of this framework motivates new multitask learning algorithms. We propose two new learning formulations by varying the parameters in the proposed framework. An efficient blockwise coordinate descent algorithm is developed suitable for solving the entire family of formulations with rigorous convergence analysis. Simulation studies have identified the statistical properties of data that would be in favor of the new formulations. Extensive empirical studies on various classification and regression benchmark data sets have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details

ICML Conference 2015 Conference Paper

Multi-view Sparse Co-clustering via Proximal Alternating Linearized Minimization

Jiangwen Sun
Jin Lu 0001
Tingyang Xu
Jinbo Bi

When multiple views of data are available for a set of subjects, co-clustering aims to identify subject clusters that agree across the different views. We explore the problem of co-clustering when the underlying clusters exist in different subspaces of each view. We propose a proximal alternating linearized minimization algorithm that simultaneously decomposes multiple data matrices into sparse row and columns vectors. This approach is able to group subjects consistently across the views and simultaneously identify the subset of features in each view that are associated with the clusters. The proposed algorithm can globally converge to a critical point of the problem. A simulation study validates that the proposed algorithm can identify the hypothesized clusters and their associated features. Comparison with several latest multi-view co-clustering methods on benchmark datasets demonstrates the superior performance of the proposed approach.

Details

JBHI Journal 2014 Journal Article

Multiview Comodeling to Improve Subtyping and Genetic Association of Complex Diseases

Jiangwen Sun
Jinbo Bi
Henry R. Kranzler

Genetic association analysis of complex diseases has been limited by heterogeneity in their clinical manifestations and genetic etiology. Research has made it possible to differentiate homogeneous subtypes of the disease phenotype. Currently, the most sophisticated subtyping methods perform unsupervised cluster analysis using only clinical features of a disorder, resulting in subtypes for which genetic association may be limited. In this study, we seek to derive a novel multiview data analytic method that integrates two views of the data: the clinical features and the genetic markers of the same set of patients. Our method is based on multiobjective programming that is capable of clinically categorizing a disease phenotype so as to discover genetically different subtypes. We optimize two objectives jointly: 1) in cluster analysis, the derived clusters should differ significantly in clinical features; 2) these clusters can be well separated using genetic markers by constructed classifiers. Extensive computational experiments with two substance-use disorders using two populations show that the proposed algorithm is superior to existing subtyping methods.

Details DOI

NeurIPS Conference 2014 Conference Paper

On Multiplicative Multitask Feature Learning

Xin Wang
Jinbo Bi
Shipeng Yu
Jiangwen Sun

We investigate a general framework of multiplicative multitask feature learning which decomposes each task's model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods have been proposed as special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the task-specific component for all these regularizers, leading to a better understanding of the shrinkage effect. Study of this framework motivates new multitask learning algorithms. We propose two new learning formulations by varying the parameters in the proposed framework. Empirical studies have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks.

PDF Details

TIST Journal 2013 Journal Article

A machine learning approach to college drinking prediction and risk factor identification

Jinbo Bi
Jiangwen Sun
Yu Wu
Howard Tennen
Stephen Armeli

Alcohol misuse is one of the most serious public health problems facing adolescents and young adults in the United States. National statistics shows that nearly 90% of alcohol consumed by youth under 21 years of age involves binge drinking and 44% of college students engage in high-risk drinking activities. Conventional alcohol intervention programs, which aim at installing either an alcohol reduction norm or prohibition against underage drinking, have yielded little progress in controlling college binge drinking over the years. Existing alcohol studies are deductive where data are collected to investigate a psychological/behavioral hypothesis, and statistical analysis is applied to the data to confirm the hypothesis. Due to this confirmatory manner of analysis, the resulting statistical models are cohort-specific and typically fail to replicate on a different sample. This article presents two machine learning approaches for a secondary analysis of longitudinal data collected in college alcohol studies sponsored by the National Institute on Alcohol Abuse and Alcoholism. Our approach aims to discover knowledge, from multiwave cohort-sequential daily data, which may or may not align with the original hypothesis but quantifies predictive models with higher likelihood to generalize to new samples. We first propose a so-called temporally-correlated support vector machine to construct a classifier as a function of daily moods, stress, and drinking expectancies to distinguish days with nighttime binge drinking from days without for individual students. We then propose a combination of cluster analysis and feature selection, where cluster analysis is used to identify drinking patterns based on averaged daily drinking behavior and feature selection is used to identify risk factors associated with each pattern. We evaluate our methods on two cohorts of 530 total college students recruited during the Spring and Fall semesters, respectively. Cross validation on these two cohorts and further on 100 random partitions of the total students demonstrate that our methods improve the model generalizability in comparison with traditional multilevel logistic regression. The discovered risk factors and the interaction of these factors delineated in our models can set a potential basis and offer insights to a new design of more effective college alcohol interventions.

Details DOI

ICML Conference 2008 Conference Paper

Bayesian multiple instance learning: automatic feature selection and inductive transfer

Vikas C. Raykar
Balaji Krishnapuram
Jinbo Bi
Murat Dundar
R. Bharat Rao

Details

IJCAI Conference 2007 Conference Paper

Murat Dundar
Balaji Krishnapuram
Jinbo Bi
R. Bharat Rao

Most methods for classifier design assume that the training samples are drawn independently and identically from an unknown data generating distribution, although this assumption is violated in several real life problems. Relaxing this IID assumption, we consider algorithms from the statistics literature for the more realistic situation where batches or sub-groups of training samples may have internal correlations, although the samples from different batches may be considered to be uncorrelated. Next, we propose simpler (more efficient) variants that scale well to large datasets; theoretical results are provided to support their validity. Experimental results from real-life computer aided diagnosis (CAD) problems indicate that relaxing the IID assumption leads to statistically significant improvements in the accuracy of the learned classifier. Surprisingly, the simpler algorithm proposed here is experimentally found to be even more accurate than the original version.

PDF Details

AAAI Conference 2007 Conference Paper

A Mathematical Programming Formulation for Sparse Collaborative Computer Aided Diagnosis

Jinbo Bi

A mathematical programming formulation is proposed to eliminate irrelevant and redundant features for collaborative computer aided diagnosis which requires to detect multiple clinically-related malignant structures from medical images. A probabilistic interpretation is described to justify our formulations. The proposed formulation is optimized through an effective alternating optimization algorithm that is easy to implement and relatively fast to solve. This collaborative prediction approach has been implemented and validated on the automatic detection of solid lung nodules by jointly detecting ground glass opacities.

PDF Details

ICML Conference 2006 Conference Paper

Active learning via transductive experimental design

Kai Yu 0001
Jinbo Bi
Volker Tresp

This paper considers the problem of selecting the most informative experiments x to get measurements y for learning a regression model y = f (x). We propose a novel and simple concept for active learning, transductive experimental design , that explores available unmeasured experiments (i.e., unlabeled data) and has a better scalability in comparison with classic experimental design methods. Our in-depth analysis shows that the new method tends to favor experiments that are on the one side hard-to-predict and on the other side representative for the rest of the experiments. Efficient optimization of the new design problem is achieved through alternating optimization and sequential greedy search. Extensive experimental results on synthetic problems and three real-world tasks, including questionnaire design for preference learning, active learning for text categorization, and spatial sensor placement, highlight the advantages of the proposed approaches.

Details

ICML Conference 2004 Conference Paper

A fast iterative algorithm for fisher discriminant using heterogeneous kernels

Glenn Fung
Murat Dundar
Jinbo Bi
R. Bharat Rao

Details

NeurIPS Conference 2004 Conference Paper

Support Vector Classification with Input Data Uncertainty

Jinbo Bi
Tong Zhang

This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classiﬁcation, which allows uncer- tainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efﬁciently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input.

PDF Details

JMLR Journal 2003 Journal Article

Dimensionality Reduction via Sparse Support Vector Machines (Kernel Machines Section)

Jinbo Bi
Kristin Bennett
Mark Embrechts
Curt Breneman
Minghu Song

We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with l 1 -norm regularization inherently performs variable selection as a side-effect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the effects of variables. Starplots are used to visualize the magnitude and variance of the weights for each variable. We illustrate the effectiveness of the methodology on synthetic data, benchmark problems, and challenging regression problems in drug design. This method can dramatically reduce the number of variables and outperforms SVMs trained using all attributes and using the attributes selected according to correlation coefficients. The visualization of the resulting models is useful for understanding the role of underlying variables.

PDF Details

ICML Conference 2003 Conference Paper

Multi-Objective Programming in SVMs

Jinbo Bi

Details

ICML Conference 2003 Conference Paper

Regression Error Characteristic Curves

Jinbo Bi
Kristin P. Bennett

Details