Arrow Research search

Author name cluster

Aram Galstyan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

44 papers
2 author rows

Possible papers

44

NeurIPS Conference 2025 Conference Paper

Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

  • Woomin Song
  • Sai Muralidhar Jayanthi
  • Srikanth Ronanki
  • Kanthashree Mysore Sathyendra
  • Jinwoo Shin
  • Aram Galstyan
  • Shubham Katiyar
  • Sravan Babu Bodapati

As large language models increasingly gain popularity in real-world applications, processing extremely long contexts, often exceeding the model’s pre-trained context limits, has emerged as a critical challenge. While existing approaches to efficient long-context processing show promise, recurrent compression-based methods struggle with information preservation, whereas random access approaches require substantial memory resources. We introduce REFORM, a novel inference framework that efficiently handles long contexts through a two-phase approach. First, it incrementally processes input chunks while maintaining a compressed KV cache, constructs cross-layer context embeddings, and utilizes early exit strategy for improved efficiency. Second, it identifies and gathers essential tokens via similarity matching and selectively recomputes the KV cache. Compared to baselines, REFORM achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length. It also outperforms baselines on ∞-Bench, RepoEval, and MM-NIAH, demonstrating flexibility across diverse tasks and domains. Additionally, REFORM reduces inference time by 30% and peak memory usage by 5%, achieving both efficiency and superior performance.

ICLR Conference 2025 Conference Paper

SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins

  • Jongwoo Ko
  • Saket Dingliwal
  • Bhavana Ganesh
  • Sailik Sengupta
  • Sravan Babu Bodapati
  • Aram Galstyan

Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives to Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used by DAAs are usually collected before alignment training begins and remain unchanged (off-policy). This design leads to two problems where the policy model (1) picks up on spurious correlations in the dataset (as opposed to only learning alignment to human preferences), and (2) overfits to feedback on off-policy trajectories that have less likelihood of being generated by the updated policy model. To address these issues, we introduce Self-Reviewing and Alignment (SeRA), a cost-efficient and effective method that can be readily combined with existing DAAs. SeRA comprises of two components: (1) sample selection using implicit reward margin to alleviate over-optimization on such undesired features, and (2) preference bootstrapping using implicit rewards to augment preference data with updated policy models in a cost-efficient manner. Extensive experiments, including on instruction-following tasks, demonstrate the effectiveness and generality of SeRA in training LLMs with diverse offline preference datasets and and DAAs.

IJCAI Conference 2023 Conference Paper

Cognitively Inspired Learning of Incremental Drifting Concepts

  • Mohammad Rostami
  • Aram Galstyan

Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to learn new concepts and expand its learned knowledge to new domains incrementally in a continual learning setting. We rely on the Parallel Distributed Processing theory to encode abstract concepts in an embedding space in terms of a multimodal distribution. This embedding space is modeled by internal data representations in a hidden network layer. We also leverage the Complementary Learning Systems theory to equip the model with a memory mechanism to overcome catastrophic forgetting through implementing pseudo-rehearsal. Our model can generate pseudo-data points for experience replay and accumulate new experiences to past learned experiences without causing cross-task interference.

AAAI Conference 2023 Conference Paper

Overcoming Concept Shift in Domain-Aware Settings through Consolidated Internal Distributions

  • Mohammad Rostami
  • Aram Galstyan

We develop an algorithm to improve the predictive performance of a pre-trained model under \textit{concept shift} without retraining the model from scratch when only unannotated samples of initial concepts are accessible. We model this problem as a domain adaptation problem, where the source domain data is inaccessible during model adaptation. The core idea is based on consolidating the intermediate internal distribution, learned to represent the source domain data, after adapting the model. We provide theoretical analysis and conduct extensive experiments on five benchmark datasets to demonstrate that the proposed method is effective.

UAI Conference 2023 Conference Paper

Partial identification of dose responses with hidden confounders

  • Myrl G. Marmarelis
  • Elizabeth Haddad
  • Andrew Jesson
  • Neda Jahanshad
  • Aram Galstyan
  • Greg Ver Steeg

Inferring causal effects of continuous-valued treatments from observational data is a crucial task promising to better inform policy- and decision-makers. A critical assumption needed to identify these effects is that all confounding variables—causal parents of both the treatment and the outcome—are included as covariates. Unfortunately, given observational data alone, we cannot know with certainty that this criterion is satisfied. Sensitivity analyses provide principled ways to give bounds on causal estimates when confounding variables are hidden. While much attention is focused on sensitivity analyses for discrete-valued treatments, much less is paid to continuous-valued treatments. We present novel methodology to bound both average and conditional average continuous-valued treatment-effect estimates when they cannot be point identified due to hidden confounding. A semi-synthetic benchmark on multiple datasets shows our method giving tighter coverage of the true dose-response curve than a recently proposed continuous sensitivity model and baselines. Finally, we apply our method to a real-world observational case study to demonstrate the value of identifying dose-dependent causal effects.

JAIR Journal 2022 Journal Article

A Metric Space for Point Process Excitations

  • Myrl G. Marmarelis
  • Greg Ver Steeg
  • Aram Galstyan

A multivariate Hawkes process enables self- and cross-excitations through a triggering matrix that behaves like an asymmetrical covariance structure, characterizing pairwise interactions between the event types. Full-rank estimation of all interactions is often infeasible in empirical settings. Models that specialize on a spatiotemporal application alleviate this obstacle by exploiting spatial locality, allowing the dyadic relationships between events to depend only on separation in time and relative distances in real Euclidean space. Here we generalize this framework to any multivariate Hawkes process, and harness it as a vessel for embedding arbitrary event types in a hidden metric space. Specifically, we propose a Hidden Hawkes Geometry (HHG) model to uncover the hidden geometry between event excitations in a multivariate point process. The low dimensionality of the embedding regularizes the structure of the inferred interactions. We develop a number of estimators and validate the model by conducting several experiments. In particular, we investigate regional infectivity dynamics of COVID-19 in an early South Korean record and recent Los Angeles confirmed cases. By additionally performing synthetic experiments on short records as well as explorations into options markets and the Ebola epidemic, we demonstrate that learning the embedding alongside a point process uncovers salient interactions in a broad range of applications.

AAAI Conference 2021 Conference Paper

Exacerbating Algorithmic Bias through Fairness Attacks

  • Ninareh Mehrabi
  • Muhammad Naveed
  • Fred Morstatter
  • Aram Galstyan

Algorithmic fairness has attracted significant attention in recent years, with many quantitative measures suggested for characterizing the fairness of different machine learning algorithms. Despite this interest, the robustness of those fairness measures with respect to an intentional adversarial attack has not been properly addressed. Indeed, most adversarial machine learning has focused on the impact of malicious attacks on the accuracy of the system, without any regard to the system’s fairness. We propose new types of data poisoning attacks where an adversary intentionally targets the fairness of a system. Specifically, we propose two families of attacks that target fairness measures. In the anchoring attack, we skew the decision boundary by placing poisoned points near specific target points to bias the outcome. In the influence attack on fairness, we aim to maximize the covariance between the sensitive attributes and the decision outcome and affect the fairness of the model. We conduct extensive experiments that indicate the effectiveness of our proposed attacks.

ICLR Conference 2021 Conference Paper

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning

  • Elan Sopher Markowitz
  • Keshav Balasubramanian
  • Mehrnoosh Mirtaheri
  • Sami Abu-El-Haija
  • Bryan Perozzi
  • Greg Ver Steeg
  • Aram Galstyan

Graph Representation Learning (GRL) methods have impacted fields from chemistry to social science. However, their algorithmic implementations are specialized to specific use-cases e.g. "message passing" methods are run differently from "node embedding" ones. Despite their apparent differences, all these methods utilize the graph structure, and therefore, their learning can be approximated with stochastic graph traversals. We propose Graph Traversal via Tensor Functionals (GTTF), a unifying meta-algorithm framework for easing the implementation of diverse graph algorithms and enabling transparent and efficient scaling to large graphs. GTTF is founded upon a data structure (stored as a sparse tensor) and a stochastic graph traversal algorithm (described using tensor operations). The algorithm is a functional that accept two functions, and can be specialized to obtain a variety of GRL models and objectives, simply by changing those two functions. We show for a wide class of methods, our algorithm learns in an unbiased fashion and, in expectation, approximates the learning as if the specialized implementations were run directly. With these capabilities, we scale otherwise non-scalable methods to set state-of-the-art on large graph datasets while being more efficient than existing GRL libraries -- with only a handful of lines of code for each method specialization.

NeurIPS Conference 2021 Conference Paper

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

  • Greg Ver Steeg
  • Aram Galstyan

Sampling from an unnormalized probability distribution is a fundamental problem in machine learning with applications including Bayesian modeling, latent factor inference, and energy-based model training. After decades of research, variations of MCMC remain the default approach to sampling despite slow convergence. Auxiliary neural models can learn to speed up MCMC, but the overhead for training the extra model can be prohibitive. We propose a fundamentally different approach to this problem via a new Hamiltonian dynamics with a non-Newtonian momentum. In contrast to MCMC approaches like Hamiltonian Monte Carlo, no stochastic step is required. Instead, the proposed deterministic dynamics in an extended state space exactly sample the target distribution, specified by an energy function, under an assumption of ergodicity. Alternatively, the dynamics can be interpreted as a normalizing flow that samples a specified energy model without training. The proposed Energy Sampling Hamiltonian (ESH) dynamics have a simple form that can be solved with existing ODE solvers, but we derive a specialized solver that exhibits much better performance. ESH dynamics converge faster than their MCMC competitors enabling faster, more stable training of neural network energy models.

NeurIPS Conference 2021 Conference Paper

Implicit SVD for Graph Representation Learning

  • Sami Abu-El-Haija
  • Hesham Mostafa
  • Marcel Nassar
  • Valentino Crespi
  • Greg Ver Steeg
  • Aram Galstyan

Recent improvements in the performance of state-of-the-art (SOTA) methods for Graph Representational Learning (GRL) have come at the cost of significant computational resource requirements for training, e. g. , for calculating gradients via backprop over many data epochs. Meanwhile, Singular Value Decomposition (SVD) can find closed-form solutions to convex problems, using merely a handful of epochs. In this paper, we make GRL more computationally tractable for those with modest hardware. We design a framework that computes SVD of *implicitly* defined matrices, and apply this framework to several GRL tasks. For each task, we derive first-order approximation of a SOTA model, where we design (expensive-to-store) matrix $\mathbf{M}$ and train the model, in closed-form, via SVD of $\mathbf{M}$, without calculating entries of $\mathbf{M}$. By converging to a unique point in one step, and without calculating gradients, our models show competitive empirical test performance over various graphs such as article citation and biological interaction networks. More importantly, SVD can initialize a deeper model, that is architected to be non-linear almost everywhere, though behaves linearly when its parameters reside on a hyperplane, onto which SVD initializes. The deeper model can then be fine-tuned within only a few epochs. Overall, our algorithm trains hundreds of times faster than state-of-the-art methods, while competing on test empirical performance. We open-source our implementation at: https: //github. com/samihaija/isvd

NeurIPS Conference 2021 Conference Paper

Information-theoretic generalization bounds for black-box learning algorithms

  • Hrayr Harutyunyan
  • Maxim Raginsky
  • Greg Ver Steeg
  • Aram Galstyan

We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.

UAI Conference 2021 Conference Paper

q-Paths: Generalizing the geometric annealing path using power means

  • Vaden Masrani
  • Rob Brekelmans
  • Thang Bui
  • Frank Nielsen
  • Aram Galstyan
  • Greg Ver Steeg
  • Frank Wood

Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $\alpha$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.

ICML Conference 2020 Conference Paper

All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

  • Rob Brekelmans
  • Vaden Masrani
  • Frank Wood
  • Greg Ver Steeg
  • Aram Galstyan

The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a “schedule” of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.

ICML Conference 2020 Conference Paper

Improving generalization by controlling label-noise information in neural network weights

  • Hrayr Harutyunyan
  • Kyle Reing
  • Greg Ver Steeg
  • Aram Galstyan

In the presence of noisy or incorrect labels, neural networks have the undesirable tendency to memorize information about the noise. Standard regularization techniques such as dropout, weight decay or data augmentation sometimes help, but do not prevent this behavior. If one considers neural network weights as random variables that depend on the data and stochasticity of training, the amount of memorized information can be quantified with the Shannon mutual information between weights and the vector of all training labels given inputs, $I(w; \mathbf{y} \mid \mathbf{x})$. We show that for any training algorithm, low values of this term correspond to reduction in memorization of label-noise and better generalization bounds. To obtain these low values, we propose training algorithms that employ an auxiliary network that predicts gradients in the final layers of a classifier without accessing labels. We illustrate the effectiveness of our approach on versions of MNIST, CIFAR-10, and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels.

AAAI Conference 2020 Conference Paper

Modeling Dialogues with Hashcode Representations: A Nonparametric Approach

  • Sahil Garg
  • Irina Rish
  • Guillermo Cecchi
  • Palash Goyal
  • Sarik Ghazarian
  • Shuyang Gao
  • Greg Ver Steeg
  • Aram Galstyan

We propose a novel dialogue modeling framework, the firstever nonparametric kernel functions based approach for dialogue modeling, which learns hashcodes as text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-selection criterion favoring representations with better alignment between the utterances of participants in a collaborative dialogue setting, as well as higher predictability of the generated responses. As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-ofart neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators.

AAAI Conference 2020 Conference Paper

Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems

  • Sarik Ghazarian
  • Ralph Weischedel
  • Aram Galstyan
  • Nanyun Peng

User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, predictive engagement, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can be incorporated into automatic evaluation metrics for open-domain dialogue systems to improve the correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.

NeurIPS Conference 2019 Conference Paper

Exact Rate-Distortion in Autoencoders via Echo Noise

  • Rob Brekelmans
  • Daniel Moyer
  • Aram Galstyan
  • Greg Ver Steeg

Compression is at the heart of effective representation learning. However, lossy compression is typically achieved through simple parametric models like Gaussian noise to preserve analytic tractability, and the limitations this imposes on learning are largely unexplored. Further, the Gaussian prior assumptions in models such as variational autoencoders (VAEs) provide only an upper bound on the compression rate in general. We introduce a new noise channel, Echo noise, that admits a simple, exact expression for mutual information for arbitrary input distributions. The noise is constructed in a data-driven fashion that does not require restrictive distributional assumptions. With its complex encoding mechanism and exact rate regularization, Echo leads to improved bounds on log-likelihood and dominates beta-VAEs across the achievable range of rate-distortion trade-offs. Further, we show that Echo noise can outperform flow-based methods without the need to train additional distributional transformations.

NeurIPS Conference 2019 Conference Paper

Fast structure learning with modular regularization

  • Greg Ver Steeg
  • Hrayr Harutyunyan
  • Daniel Moyer
  • Aram Galstyan

Estimating graphical model structure from high-dimensional and undersampled data is a fundamental problem in many scientific fields. Existing approaches, such as GLASSO, latent variable GLASSO, and latent tree models, suffer from high computational complexity and may impose unrealistic sparsity priors in some cases. We introduce a novel method that leverages a newly discovered connection between information-theoretic measures and structured latent factor models to derive an optimization objective which encourages modular structures where each observed variable has a single latent parent. The proposed method has linear stepwise computational complexity w. r. t. the number of observed variables. Our experiments on synthetic data demonstrate that our approach is the only method that recovers modular structure better as the dimensionality increases. We also use our approach for estimating covariance structure for a number of real-world datasets and show that it consistently outperforms state-of-the-art estimators at a fraction of the computational cost. Finally, we apply the proposed method to high-resolution fMRI data (with more than 10^5 voxels) and show that it is capable of extracting meaningful patterns.

AAAI Conference 2019 Conference Paper

Kernelized Hashcode Representations for Relation Extraction

  • Sahil Garg
  • Aram Galstyan
  • Greg Ver Steeg
  • Irina Rish
  • Guillermo Cecchi
  • Shuyang Gao

Kernel methods have produced state-of-the-art results for a number of NLP tasks such as relation extraction, but suffer from poor scalability due to the high cost of computing kernel similarities between natural language structures. A recently proposed technique, kernelized locality-sensitive hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing an explicit representation of NLP structures suitable for general classification methods. Further, we propose an approach for optimizing the KLSH model for classification problems by maximizing an approximation of mutual information between the KLSH codes (feature vectors) and the class labels. We evaluate the proposed approach on biomedical relation extraction datasets, and observe significant and robust improvements in accuracy w. r. t. state-ofthe-art classifiers, along with drastic (orders-of-magnitude) speedup compared to conventional kernel methods.

ICML Conference 2019 Conference Paper

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

  • Sami Abu-El-Haija
  • Bryan Perozzi
  • Amol Kapoor
  • Nazanin Alipourfard
  • Kristina Lerman
  • Hrayr Harutyunyan
  • Greg Ver Steeg
  • Aram Galstyan

Existing popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) provably cannot learn a general class of neighborhood mixing relationships. To address this weakness, we propose a new model, MixHop, that can learn these relationships, including difference operators, by repeatedly mixing feature representations of neighbors at various distances. MixHop requires no additional memory or computational complexity, and outperforms on challenging baselines. In addition, we propose sparsity regularization that allows us to visualize how the network prioritizes neighborhood information across different graph datasets. Our analysis of the learned architectures reveals that neighborhood mixing varies per datasets.

IJCAI Conference 2019 Conference Paper

SAGE: A Hybrid Geopolitical Event Forecasting System

  • Fred Morstatter
  • Aram Galstyan
  • Gleb Satyukov
  • Daniel Benjamin
  • Andres Abeliuk
  • Mehrnoosh Mirtaheri
  • KSM Tozammel Hossain
  • Pedro Szekely

Forecasting of geopolitical events is a notoriously difficult task, with experts failing to significantly outperform a random baseline across many types of forecasting events. One successful way to increase the performance of forecasting tasks is to turn to crowdsourcing: leveraging many forecasts from non-expert users. Simultaneously, advances in machine learning have led to models that can produce reasonable, although not perfect, forecasts for many tasks. Recent efforts have shown that forecasts can be further improved by ``hybridizing'' human forecasters: pairing them with the machine models in an effort to combine the unique advantages of both. In this demonstration, we present Synergistic Anticipation of Geopolitical Events (SAGE), a platform for human/computer interaction that facilitates human reasoning with machine models.

UAI Conference 2018 Conference Paper

A Forest Mixture Bound for Block-Free Parallel Inference

  • Neal Lawton
  • Greg Ver Steeg
  • Aram Galstyan

way to perform parallel inference in such models, or do we have to resort to the serial coordinate algorithm? Coordinate ascent variational inference is an important algorithm for inference in probabilistic models, but it is slow because it updates only a single variable at a time. Block coordinate methods perform inference faster by updating blocks of variables in parallel. However, the speed and convergence of these algorithms depends on how the variables are partitioned into blocks. In this paper, we give a convergent parallel algorithm for inference in deep exponential families that doesn’t require the variables to be partitioned into blocks. We achieve this by lower bounding the ELBO by a new objective we call the forest mixture bound (FM bound) that separates the inference problem for variables within a hidden layer. We apply this to the simple case when all random variables are Gaussian and show empirically that the algorithm converges faster for models that are inherently more forest-like. Block methods provide one avenue for parallel inference. These algorithms work by first partitioning the latent variables into a collection of blocks, and then iteratively updating a variable from each block in parallel. However, the speed (as in MCMC methods [Terenin et al. , 2015]) or convergence (as in Hogwild methods [Recht et al. , 2011]) of the resulting algorithm will depend on how the variables are blocked, and finding a good choice of blocking for an arbitrary model can be difficult.

NeurIPS Conference 2018 Conference Paper

Invariant Representations without Adversarial Training

  • Daniel Moyer
  • Shuyang Gao
  • Rob Brekelmans
  • Aram Galstyan
  • Greg Ver Steeg

Representations of data that are invariant to changes in specified factors are useful for a wide range of problems: removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation. Unfortunately, learning representations that exhibit invariance to arbitrary nuisance factors yet remain useful for other tasks is challenging. Existing approaches cast the trade-off between task performance and invariance in an adversarial way, using an iterative minimax optimization. We show that adversarial training is unnecessary and sometimes counter-productive; we instead cast invariant representation learning as a single information-theoretic objective that can be directly optimized. We demonstrate that this approach matches or exceeds performance of state-of-the-art adversarial approaches for learning fair representations and for generative modeling with controllable transformations.

IJCAI Conference 2017 Conference Paper

Sifting Common Information from Many Variables

  • Greg Ver Steeg
  • Shuyang Gao
  • Kyle Reing
  • Aram Galstyan

Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data.

AAAI Conference 2016 Conference Paper

Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text

  • Sahil Garg
  • Aram Galstyan
  • Ulf Hermjakob
  • Daniel Marcu

We advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surfaceand syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions.

ICML Conference 2016 Conference Paper

The Information Sieve

  • Greg Ver Steeg
  • Aram Galstyan

We introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set of latent factors explaining all the dependence in the original data and remainder information consisting of independent noise. We present a practical implementation of this framework for discrete variables and apply it to a variety of fundamental tasks in unsupervised learning including independent component analysis, lossy and lossless compression, and predicting missing values in data.

NeurIPS Conference 2016 Conference Paper

Variational Information Maximization for Feature Selection

  • Shuyang Gao
  • Greg Ver Steeg
  • Aram Galstyan

Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class labels. Practical methods are forced to rely on approximations due to the difficulty of estimating mutual information. We demonstrate that approximations made by existing methods are based on unrealistic assumptions. We formulate a more flexible and general class of assumptions based on variational distributions and use them to tractably generate lower bounds for mutual information. These bounds define a novel information-theoretic framework for feature selection, which we prove to be optimal under tree graphical models with proper choice of variational distributions. Our experiments demonstrate that the proposed method strongly outperforms existing information-theoretic feature selection approaches.

UAI Conference 2015 Conference Paper

Estimating Mutual Information by Local Gaussian Approximation

  • Shuyang Gao
  • Greg Ver Steeg
  • Aram Galstyan

Estimating mutual information (MI) from samples is a fundamental problem in statistics, machine learning, and data analysis. Recently it was shown that a popular class of non-parametric MI estimators perform very poorly for strongly dependent variables and have sample complexity that scales exponentially with the true MI. This undesired behavior was attributed to the reliance of those estimators on local uniformity of the underlying (and unknown) probability density function. Here we present a novel semi-parametric estimator of mutual information, where at each sample point, densities are locally approximated by a Gaussians distribution. We demonstrate that the estimator is asymptotically unbiased. We also show that the proposed estimator has a superior performance compared to several baselines, and is able to accurately measure relationship strengths over many orders of magnitude.

ICML Conference 2014 Conference Paper

Demystifying Information-Theoretic Clustering

  • Greg Ver Steeg
  • Aram Galstyan
  • Fei Sha
  • Simon DeDeo

We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions. Previous attempts to use information theory to define clusters in an assumption-free way are based on maximizing mutual information between data and cluster labels. We demonstrate that this intuition suffers from a fundamental conceptual flaw that causes clustering performance to deteriorate as the amount of data increases. Instead, we return to the axiomatic foundations of information theory to define a meaningful clustering measure based on the notion of consistency under coarse-graining for finite data.

NeurIPS Conference 2014 Conference Paper

Discovering Structure in High-Dimensional Data Through Correlation Explanation

  • Greg Ver Steeg
  • Aram Galstyan

We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best explain the correlations in the data as measured by multivariate mutual information. The method is unsupervised, requires no model assumptions, and scales linearly with the number of variables which makes it an attractive approach for very high dimensional systems. We demonstrate that Correlation Explanation (CorEx) automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.

AAAI Conference 2014 Conference Paper

Where and Why Users “Check In”

  • Yoon-Sik Cho
  • Greg Ver Steeg
  • Aram Galstyan

The emergence of location based social network (LBSN) services makes it possible to study individuals’ mobility patterns at a fine-grained level and to see how they are impacted by social factors. In this study we analyze the check-in patterns in LBSN and observe significant temporal clustering of check-in activities. We explore how self-reinforcing behaviors, social factors, and exogenous effects contribute to this clustering and introduce a framework to distinguish these effects at the level of individual check-ins for both users and venues. Using check-in data from three major cities, we show not only that our model can improve prediction of future check-ins, but also that disentangling of different factors allows us to infer meaningful properties of different venues.

AAMAS Conference 2012 Conference Paper

Adaptive Agents on Evolving Networks

  • Ardeshir Kianercy
  • Aram Galstyan
  • Armen Allahverdyan

We propose a model of strategic network formation in repeated games where players adopt actions and connections simultaneously using a simple reinforcement learning scheme. We demonstrate that under certain plausible assumptions the dynamics of such systems can be described by so called replicator equations that characterize the co-evolution of agent strategies and network topology. Within this framework, the network structures emerging as a result of the game-dynamical interactions are described by the stable rest points of the replicator dynamics. In particular, we show using both simulations and analytical methods that for certain N-agent games the stable equilibria consist of star motifs as the main building blocks of the network.

UAI Conference 2011 Conference Paper

A Sequence of Relaxation Constraining Hidden Variable Models

  • Greg Ver Steeg
  • Aram Galstyan

Many widely studied graphical models with latent variables lead to nontrivial constraints on the distribution of the observed variables. Inspired by the Bell inequalities in quantum mechanics, we refer to any linear inequality whose violation rules out some latent variable model as a "hidden variable test" for that model. Our main contribution is to introduce a sequence of relaxations which provides progressively tighter hidden variable tests. We demonstrate applicability to mixtures of sequences of i.i.d. variables, Bell inequalities, and homophily models in social networks. For the last, we demonstrate that our method provides a test that is able to rule out latent homophily as the sole explanation for correlations on a real social network that are known to be due to influence.

AAAI Conference 2011 Conference Paper

Co-Evolution of Selection and Influence in Social Networks

  • Yoon-Sik Cho
  • Greg Steeg
  • Aram Galstyan

Many networks are complex dynamical systems, where both attributes of nodes and topology of the network (link structure) can change with time. We propose a model of co-evolving networks where both node attributes and network structure evolve under mutual in- fluence. Specifically, we consider a mixed membership stochastic blockmodel, where the probability of observing a link between two nodes depends on their current membership vectors, while those membership vectors themselves evolve in the presence of a link between the nodes. Thus, the network is shaped by the interaction of stochastic processes describing the nodes, while the processes themselves are influenced by the changing network structure. We derive an efficient variational inference procedure for our model, and validate the model on both synthetic and real–world data.

NeurIPS Conference 2011 Conference Paper

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

  • Armen Allahverdyan
  • Aram Galstyan

We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam's razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters.

JAAMAS Journal 2011 Journal Article

Continuous strategy replicator dynamics for multi-agent Q-learning

  • Aram Galstyan

Abstract The problem of multi-agent learning and adaptation has attracted a great deal of attention in recent years. It has been suggested that the dynamics of multi agent learning can be studied using replicator equations from population biology. Most existing studies so far have been limited to discrete strategy spaces with a small number of available actions. In many cases, however, the choices available to agents are better characterized by continuous spectra. This paper suggests a generalization of the replicator framework that allows to study the adaptive dynamics of Q -learning agents with continuous strategy spaces. Instead of probability vectors, agents’ strategies are now characterized by probability measures over continuous variables. As a result, the ordinary differential equations for the discrete case are replaced by a system of coupled integral-differential replicator equations that describe the mutual evolution of individual agent strategies. We derive a set of functional equations describing the steady state of the replicator dynamics, examine their solutions for several two-player games, and confirm our analytical results using simulations.

UAI Conference 2011 Conference Paper

Statistical Mechanics of Semi-Supervised Clustering in Sparse Graphs (Abstract)

  • Greg Ver Steeg
  • Aram Galstyan
  • Armen E. Allahverdyan

The problem of detecting modules, or clusters, in networks has recently attracted a considerable interest both in statistical physics and computer science communities. In addition to algorithmic development, recent research has addressed the feasibility of detecting clusters, assuming they are present in the network.

UAI Conference 2009 Conference Paper

On Maximum a Posteriori Estimation of Hidden Markov Processes

  • Armen E. Allahverdyan
  • Aram Galstyan

We present a theoretical analysis of Maximum a Posteriori (MAP) sequence estimation for binary symmetric hidden Markov processes. We reduce the MAP estimation to the energy minimization of an appropriately defined Ising spin model, and focus on the performance of MAP as characterized by its accuracy and the number of solutions corresponding to a typical observed sequence. It is shown that for a finite range of sufficiently low noise levels, the solution is uniquely related to the observed sequence, while the accuracy degrades linearly with increasing the noise strength. For intermediate noise values, the accuracy is nearly noise-independent, but now there are exponentially many solutions to the estimation problem, which is reflected in non–zero ground–state entropy for the Ising model. Finally, for even larger noise intensities, the number of solutions reduces again, but the accuracy is poor. It is shown that these regimes are different thermodynamic phases of the Ising model that are related to each other via first-order phase transitions.

IROS Conference 2009 Conference Paper

TENTACLES: Self-configuring robotic radio networks in unknown environments

  • Harris Chi Ho Chiu
  • Bo Ryu
  • Hua Zhu
  • Pedro A. Szekely
  • Rajiv T. Maheswaran
  • Craig Milo Rogers
  • Aram Galstyan
  • Behnam Salemi

This paper presents a bio-inspired, distributed control algorithm called TENTACLES for a group of radio robots to move, self-configure and maintain communication between some critical entities (such as humans, command centers, or other systems) in an unknown environment. The basic idea is to direct robots' explorative movements to grow ¿tentacles¿ from entities and establish links when tentacles meet. This approach can self-heal failures of robots and improve communication coverage and quality over time. Experiments in simulations and real robots have shown positive results.

IJCAI Conference 2005 Conference Paper

Inferring Useful Heuristics from the Dynamics of Iterative Relational Classifiers

  • Aram Galstyan
  • Paul R

In this paper we consider dynamical properties of simple iterative relational classifiers. We conjecture that for a class of algorithms that use label– propagation the iterative procedure can lead to nontrivial dynamics in the number of newly classified instances. The underlaying reason for this non– triviality is that in relational networks true class labels are likely to propagate faster than false ones. We suggest that this phenomenon, which we call two-tiered dynamics for binary classifiers, can be used for establishing a self–consistent classification threshold and a criterion for stopping iteration. We demonstrate this effect for two unrelated binary classification problems using a variation of a iterative relational neighbor classifier. We also study analytically the dynamical properties of the suggested classifier, and compare its results to the numerical experiments on synthetic data.

IROS Conference 2003 Conference Paper

Macroscopic analysis of adaptive task allocation in robots

  • Kristina Lerman
  • Aram Galstyan

We describe a general mechanism for adaptation in multi-agent systems in which agents modify their behavior in response to changes in the environment or actions of other agents. The agent use memory to estimate the global state of the system from individual observations and adjust their actions accordingly. We present a mathematical model of the dynamics of collective behavior in such systems and apply it to study adaptive task allocation in mobile robots. In this application, the robots task is to forage for red or green pucks. As it travels around the arena, a robot records observations of puck and other robots, and uses these observations to compute the estimated density of each. If it finds there are not enough robots of a specific type, it may switch its foraging state to fill a gap. After a transient, we expect the number of robots in each foraging state to reflect the prevalence of each puck type in the environment. We modelled adaptive task allocation and studied the dynamics of the system for different transition rates between states. We find that for some rates lead to fast convergence times and a steady state solution.