Author name cluster

Alejandro Molina

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

AAAI Conference 2019 Conference Paper

Automatic Bayesian Density Analysis

Antonio Vergari
Alejandro Molina
Robert Peharz
Zoubin Ghahramani
Kristian Kersting
Isabel Valera

Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for exploratory data analysis are usually not ﬂexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to ﬁxed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to ﬁnd the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Speciﬁcally, ABDA allows for automatic and efﬁcient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.

PDF Details

AAAI Conference 2018 Conference Paper

Core Dependency Networks

Alejandro Molina
Alexander Munteanu
Kristian Kersting

Many applications infer the structure of a probabilistic graphical model from data to elucidate the relationships between variables. But how can we train graphical models on a massive data set? In this paper, we show how to construct coresets---compressed data sets which can be used as proxy for the original data and have provably bounded worst case error---for Gaussian dependency networks (DNs), i.e., cyclic directed graphical models over Gaussians, where the parents of each variable are its Markov blanket. Specifically, we prove that Gaussian DNs admit coresets of size independent of the size of the data set. Unfortunately, this does not extend to DNs over members of the exponential family in general. As we will prove, Poisson DNs do not admit small coresets. Despite this worst-case result, we will provide an argument why our coreset construction for DNs can still work well in practice on count data.To corroborate our theoretical results, we empirically evaluated the resulting Core DNs on real data sets. The results demonstrate significant gains over no or naive sub-sampling, even in the case of count data.

PDF Details

AAAI Conference 2018 Conference Paper

Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains

Alejandro Molina
Antonio Vergari
Nicola Di Mauro
Sriraam Natarajan
Floriana Esposito
Kristian Kersting

While all kinds of mixed data—from personal data, over panel and scientiﬁc data, to public and commercial data—are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difﬁcult. Users spend signiﬁcant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson, Logit, etc.) involved and learning the mixed models. To make this difﬁcult task easier, we propose the ﬁrst trainable probabilistic deep architecture for hybrid domains that features tractable queries. It is based on Sum-Product Networks (SPNs) with piecewise polynomial leaf distributions together with novel nonparametric decomposition and conditioning steps using the Hirschfeld-Gebelein-Rényi Maximum Correlation Coef- ﬁcient. This relieves the user from deciding a-priori the parametric form of the random variables but is still expressive enough to effectively approximate any distribution and permits efﬁcient learning and inference. Our experiments show that the architecture, called Mixed SPNs, can indeed capture complex distributions across a wide range of hybrid domains.

PDF Details

AAAI Conference 2018 Conference Paper

Sum-Product Autoencoding: Encoding and Decoding Representations Using Sum-Product Networks

Antonio Vergari
Robert Peharz
Nicola Di Mauro
Alejandro Molina
Kristian Kersting
Floriana Esposito

Sum-Product Networks (SPNs) are a deep probabilistic architecture that up to now has been successfully employed for tractable inference. Here, we extend their scope towards unsupervised representation learning: we encode samples into continuous and categorical embeddings and show that they can also be decoded back into the original input space by leveraging MPE inference. We characterize when this Sum- Product Autoencoding (SPAE) leads to equivalent reconstructions and extend it towards dealing with missing embedding information. Our experimental results on several multilabel classiﬁcation problems demonstrate that SPAE is competitive with state-of-the-art autoencoder architectures, even if the SPNs were never trained to reconstruct their inputs.

PDF Details

AAAI Conference 2017 Conference Paper

Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions

Alejandro Molina
Sriraam Natarajan
Kristian Kersting

Multivariate count data are pervasive in science in the form of histograms, contingency tables and others. Previous work on modeling this type of distributions do not allow for fast and tractable inference. In this paper we present a novel Poisson graphical model, the ﬁrst based on sum product networks, called PSPN, allowing for positive as well as negative dependencies. We present algorithms for learning tree PSPNs from data as well as for tractable inference via symbolic evaluation. With these, information-theoretic measures such as entropy, mutual information, and distances among count variables can be computed without resorting to approximations. Additionally, we show a connection between PSPNs and LDA, linking the structure of tree PSPNs to a hierarchy of topics. The experimental results on several synthetic and real world datasets demonstrate that PSPN often outperform state-of-the-art while remaining tractable.

PDF Details