Arrow Research search

Author name cluster

Abir De

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers
2 author rows

Possible papers

40

ICLR Conference 2025 Conference Paper

Charting the Design Space of Neural Graph Representations for Subgraph Matching

  • Vaibhav Raj
  • Indradyumna Roy
  • Ashwin Ramachandran
  • Soumen Chakrabarti
  • Abir De

Subgraph matching is vital in knowledge graph (KG) question answering, molecule design, scene graph, code and circuit search, etc. Neural methods have shown promising results for subgraph matching. Our study of recent systems suggests refactoring them into a unified design space for graph matching networks. Existing methods occupy only a few isolated patches in this space, which remains largely uncharted. We undertake the first comprehensive exploration of this space, featuring such axes as attention-based vs. soft permutation-based interaction between query and corpus graphs, aligning nodes vs. edges, and the form of the final scoring network that integrates neural representations of the graphs. Our extensive experiments reveal that judicious and hitherto-unexplored combinations of choices in this space lead to large performance benefits. Beyond better performance, our study uncovers valuable insights and establishes general design principles for neural graph representation and interaction, which may be of wider interest.

ICLR Conference 2025 Conference Paper

Clique Number Estimation via Differentiable Functions of Adjacency Matrix Permutations

  • Indradyumna Roy
  • Eeshaan Jain
  • Soumen Chakrabarti
  • Abir De

Estimating the clique number in a graph is central to various applications, e.g., community detection, graph retrieval, etc. Existing estimators often rely on non-differentiable combinatorial components. Here, we propose a full differentiable estimator for clique number estimation, which can be trained from distant supervision of clique numbers, rather than demonstrating actual cliques. Our key insight is a formulation of the maximum clique problem (MCP) as a maximization of the size of fully dense square submatrix, within a suitably row-column-permuted adjacency matrix. We design a differentiable mechanism to search for permutations that lead to the discovery of such dense blocks. However, the optimal permutation is not unique, which leads to the learning of spurious permutations. To tackle this problem, we view the MCP problem as a sequence of subgraph matching tasks, each detecting progressively larger cliques in a nested manner. This allows effective navigation through suitable node permutations. These steps result in MxNet, an end-to-end differentiable model, which learns to predict clique number without explicit clique demonstrations, with the added benefit of interpretability. Experiments on eight datasets show the superior accuracy of our approach.

NeurIPS Conference 2025 Conference Paper

Contextual Tokenization for Graph Inverted Indices

  • Pritish Chakraborty
  • Indradyumna Roy
  • Soumen Chakrabarti
  • Abir De

Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph representations and scores based on set alignment and containment can provide accurate subgraph isomorphism tests, their use in retrieval remains limited by their need to score corpus graphs exhaustively. We introduce CoRGII (COntextual Representation of Graphs for Inverted Indexing), a graph indexing framework in which, starting with a contextual dense graph representation, a differentiable discretization module computes sparse binary codes over a learned latent vocabulary. This text document-like representation allows us to leverage classic, highly optimized inverted indexes, while supporting soft (vector) set containment scores. Improving on this paradigm further, we replace the classical impact score of a `word' on a graph (such as defined by TFIDF or BM25) with a data-driven, trainable impact score. Crucially, CoRGII is trained end-to-end using only binary relevance labels, without fine-grained supervision of query-to-document set alignments. Extensive experiments show that CoRGII provides better trade-offs between efficiency and accuracy, compared to several baselines.

AAAI Conference 2025 Conference Paper

Differentiable Adversarial Attacks for Marked Temporal Point Processes

  • Pritish Chakraborty
  • Vinayak Gupta
  • Rahul R
  • Srikanta J. Bedathur
  • Abir De

Marked temporal point processes (MTPPs) have been shown to be extremely effective in modeling continuous time event sequences (CTESs). In this work, we present adversarial attacks designed specifically for MTPP models. A key criterion for a good adversarial attack is its imperceptibility. For objects such as images or text, this is often achieved by bounding perturbation in some fixed Lp norm-ball. However, similarly minimizing distance norms between two CTESs in the context of MTPPs is challenging due to their sequential nature and varying time-scales and lengths. We address this challenge by first permuting the events and then incorporating the additive noise to the arrival timestamps. However, the worst case optimization of such adversarial attacks is a hard combinatorial problem, requiring exploration across a permutation space that is factorially large in the length of the input sequence. As a result, we propose a novel differentiable scheme - PERMTPP - using which we can perform adversarial attacks by learning to minimize the likelihood, while minimizing the distance between two CTESs. Our experiments on four real-world datasets demonstrate the offensive and defensive capabilities, and lower inference times of PERMTPP.

ICML Conference 2025 Conference Paper

Learning Condensed Graph via Differentiable Atom Mapping for Reaction Yield Prediction

  • Ankit Ghosh
  • Gargee Kashyap
  • Sarthak Mittal
  • Nupur Jain
  • Raghavan B. Sunoj
  • Abir De

Yield of chemical reactions generally depends on the activation barrier, i. e. , the energy difference between the reactant and the transition state. Computing the transition state from the reactant and product graphs requires prior knowledge of the correct node alignment (i. e. , atom mapping), which is not available in yield prediction datasets. In this work, we propose YieldNet, a neural yield prediction model, which tackles these challenges. Here, we first approximate the atom mapping between the reactants and products using a differentiable node alignment network. We then use this approximate atom mapping to obtain a noisy realization of the condensed graph of reaction (CGR), which is a supergraph encompassing both the reactants and products. This CGR serves as a surrogate for the transition state graph structure. The CGR embeddings of different steps in a multi-step reaction are then passed into a transformer-guided reaction path encoder. Our experiments show that YieldNet can predict the yield more accurately than the baselines. Furthermore, the model is trained only under the distant supervision of yield values, without requiring fine-grained supervision of atom mapping.

NeurIPS Conference 2025 Conference Paper

Monotone and Separable Set Functions: Characterizations and Neural Models

  • SOUTRIK SARANGI
  • Yonatan Sverdlov
  • Nadav Dym
  • Abir De

Motivated by applications for set containment problems, we consider the following fundamental problem: can we design set-to-vector functions so that the natural partial order on sets is preserved, namely $S \subseteq T$ if and only if $F (S) \leq F (T )$. We call functions satisfying this property Monotone and Separating (MAS) set functions. We establish lower and upper bounds for the vector dimension necessary to obtain MAS functions, as a function of the cardinality of the multisets and the underlying ground set. In the important case of an infinite ground set, we show that MAS functions do not exist, but provide a model called MASNET which provably enjoys a relaxed MAS property we name “weakly MAS” and is stable in the sense of Holder continuity. We also show that MAS functions can be used to construct universal models that are monotone by construction and can approximate all monotone set functions. Experimentally, we consider a variety of set containment tasks. The experiments show the benefit of using our MASNET model, in comparison with standard set models which do not incorporate set containment as an inductive bias.

TIST Journal 2025 Journal Article

Retrieving Continuous-Time Event Sequences Using Neural Temporal Point Processes with Learnable Hashing

  • Vinayak Gupta
  • Srikanta Bedathur
  • Abir De

Temporal sequences have become pervasive in various real-world applications such as finance, spatial mobility, health records, and so on. Consequently, the volume of data generated in the form of continuous-time event sequence(s) or CTES(s) has increased exponentially in the past few years. Thus, a significant fraction of the ongoing research on CTES datasets involves designing models to address downstream tasks such as next-event prediction, long-term forecasting, sequence classification, and so on. The recent developments in predictive modeling using marked temporal point processes (MTPP) have enabled an accurate characterization of several real-world applications involving the CTESs. However, due to the complex nature of these CTES datasets, the task of large-scale retrieval of temporal sequences has been overlooked by the past literature. In detail, by CTES retrieval we mean that for an input query sequence, a retrieval system must return a ranked list of relevant sequences from a large corpus. To tackle this, we propose NeuroSeqRet, a first-of-its-kind framework designed specifically for end-to-end CTES retrieval. Specifically, NeuroSeqRet introduces multiple enhancements over standard retrieval frameworks and first applies a trainable unwarping function on the query sequence which makes it comparable with corpus sequences, especially when a relevant query-corpus pair has individually different attributes. Next, it feeds the unwarped query sequence and the corpus sequence into MTPP-guided neural relevance models. We develop four variants of the relevance model for different kinds of applications based on the tradeoff between accuracy and efficiency. We also propose an optimization framework to learn binary sequence embeddings from the relevance scores, suitable for the locality-sensitive hashing leading to a significant speedup in returning top- K results for a given query sequence. Our experiments with several datasets show the significant accuracy boost of NeuroSeqRet beyond several baselines, as well as the efficacy of our hashing mechanism.

AAAI Conference 2024 Conference Paper

Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing

  • Lokesh Nagalapatti
  • Akshay Iyer
  • Abir De
  • Sunita Sarawagi

We address the Individualized continuous treatment effect (ICTE) estimation problem where we predict the effect of any continuous valued treatment on an individual using ob- servational data. The main challenge in this estimation task is the potential confounding of treatment assignment with in- dividual’s covariates in the training data, whereas during in- ference ICTE requires prediction on independently sampled treatments. In contrast to prior work that relied on regularizers or unstable GAN training, we advocate the direct approach of augmenting training individuals with independently sam- pled treatments and inferred counterfactual outcomes. We in- fer counterfactual outcomes using a two-pronged strategy: a Gradient Interpolation for close-to-observed treatments, and a Gaussian Process based Kernel Smoothing which allows us to down weigh high variance inferences. We evaluate our method on five benchmarks and show that our method out- performs six state-of-the-art methods on the counterfactual estimation error. We analyze the superior performance of our method by showing that (1) our inferred counterfactual re- sponses are more accurate, and (2) adding them to the train- ing data reduces the distributional distance between the con- founded training distribution and test distribution where treat- ment is independent of covariates. Our proposed method is model-agnostic and we show that it improves ICTE accuracy of several existing models.

AAAI Conference 2024 Conference Paper

Generator Assisted Mixture of Experts for Feature Acquisition in Batch

  • Vedang Asgaonkar
  • Aditya Jain
  • Abir De

Given a set of observations, feature acquisition is about finding the subset of unobserved features which would enhance accuracy. Such problems has been explored in a sequential setting in prior work. Here, the model receives feedback from every new feature acquireed and chooses to explore more features or to predict. However, sequential acquisition is not feasible in some settings where time is of essence. We consider the problem of feature acquisition in batch, where the subset of features to be queried in batch is chosen based on the currently observed features, and then acquired as a batch, followed by prediction. We solve this problem using several technical innovations. First, we use a feature generator to draw a subset of the synthetic features for some examples, which reduces the cost of oracle queries. Second, to make the feature acquisition problem tractable for the large heterogeneous observed features, we partition the data into buckets, by borrowing tools from locality sensitive hashing and then train a mixture of experts model. Third, we design a tractable lower bound of the original objective. We use a greedy algorithm combined with model training to solve the underlying problem. Experiments with four datasets shows that our approach outperforms these methods in terms of trade off between accuracy and feature acquisition cost.

NeurIPS Conference 2024 Conference Paper

Graph Edit Distance with General Costs Using Neural Set Divergence

  • Eeshaan Jain
  • Indradyumna Roy
  • Saswat Meher
  • Soumen Chakrabarti
  • Abir De

Graph Edit Distance (GED) measures the (dis-)similarity between two given graphs in terms of the minimum-cost edit sequence, which transforms one graph to the other. GED is related to other notions of graph similarity, such as graph and subgraph isomorphism, maximum common subgraph, etc. However, the computation of exact GED is NP-Hard, which has recently motivated the design of neural models for GED estimation. However, they do not explicitly account for edit operations with different costs. In response, we propose $\texttt{GraphEdX}$, a neural GED estimator that can work with general costs specified for the four edit operations, viz. , edge deletion, edge addition, node deletion, and node addition. We first present GED as a quadratic assignment problem (QAP) that incorporates these four costs. Then, we represent each graph as a set of node and edge embeddings and use them to design a family of neural set divergence surrogates. We replace the QAP terms corresponding to each operation with their surrogates. Computing such neural set divergence requires aligning nodes and edges of the two graphs. We learn these alignments using a Gumbel-Sinkhorn permutation generator, additionally ensuring that the node and edge alignments are consistent with each other. Moreover, these alignments are cognizant of both the presence and absence of edges between node pairs. Through extensive experiments on several datasets, along with a variety of edit cost settings, we show that $\texttt{GraphEdX}$ consistently outperforms state-of-the-art methods and heuristics in terms of prediction error. The code is available at https: //github. com/structlearning/GraphEdX.

NeurIPS Conference 2024 Conference Paper

Iteratively Refined Early Interaction Alignment for Subgraph Matching based Graph Retrieval

  • Ashwin Ramachandran
  • Vaibhav Raj
  • Indrayumna Roy
  • Soumen Chakrabarti
  • Abir De

Graph retrieval based on subgraph isomorphism has several real-world applications such as scene graph retrieval, molecular fingerprint detection and circuit design. Roy et al. [35] proposed IsoNet, a late interaction model for subgraph matching, which first computes the node and edge embeddings of each graph independently of paired graph and then computes a trainable alignment map. Here, we present $\texttt{IsoNet++}$, an early interaction graph neural network (GNN), based on several technical innovations. First, we compute embeddings of all nodes by passing messages within and across the two input graphs, guided by an *injective alignment* between their nodes. Second, we update this alignment in a lazy fashion over multiple *rounds*. Within each round, we run a layerwise GNN from scratch, based on the current state of the alignment. After the completion of one round of GNN, we use the last-layer embeddings to update the alignments, and proceed to the next round. Third, $\texttt{IsoNet++}$ incorporates a novel notion of node-pair partner interaction. Traditional early interaction computes attention between a node and its potential partners in the other graph, the attention then controlling messages passed across graphs. We consider *node pairs* (not single nodes) as potential partners. Existence of an edge between the nodes in one graph and non-existence in the other provide vital signals for refining the alignment. Our experiments on several datasets show that the alignments get progressively refined with successive rounds, resulting in significantly better retrieval performance than existing methods. We demonstrate that all three innovations contribute to the enhanced accuracy. Our code and datasets are publicly available at https: //github. com/structlearning/isonetpp.

ICML Conference 2023 Conference Paper

Discrete Continuous Optimization Framework for Simultaneous Clustering and Training in Mixture Models

  • Parth Vipul Sangani
  • Arjun Shashank Kashettiwar
  • Pritish Chakraborty
  • Bhuvan Reddy Gangula
  • Durga Sivasubramanian
  • Ganesh Ramakrishnan
  • Rishabh K. Iyer
  • Abir De

We study a new framework of learning mixture models via automatic clustering called PRESTO, wherein we optimize a joint objective function on the model parameters and the partitioning, with each model tailored to perform well on its specific cluster. In contrast to prior work, we do not assume any generative model for the data. We convert our training problem to a joint parameter estimation cum a subset selection problem, subject to a matroid span constraint. This allows us to reduce our problem into a constrained set function minimization problem, where the underlying objective is monotone and approximately submodular. We then propose a new joint discrete-continuous optimization algorithm that achieves a bounded approximation guarantee for our problem. We show that PRESTO outperforms several alternative methods. Finally, we study PRESTO in the context of resource-efficient deep learning, where we train smaller resource-constrained models on each partition and show that it outperforms existing data partitioning and model pruning/knowledge distillation approaches, which in contrast to PRESTO, require large initial (teacher) models.

NeurIPS Conference 2023 Conference Paper

Efficient Data Subset Selection to Generalize Training Across Models: Transductive and Inductive Networks

  • Eeshaan Jain
  • Tushar Nandy
  • Gaurav Aggarwal
  • Ashish Tendulkar
  • Rishabh Iyer
  • Abir De

Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches, which lack generalizability--- for each new model, the algorithm has to be executed from the beginning. Therefore, for an unseen architecture, one cannot use the subset chosen for a different model. In this work, we propose $\texttt{SubSelNet}$, a non-adaptive subset selection framework, which tackles these problems. Here, we first introduce an attention-based neural gadget that leverages the graph structure of architectures and acts as a surrogate to trained deep neural networks for quick model prediction. Then, we use these predictions to build subset samplers. This naturally provides us two variants of $\texttt{SubSelNet}$. The first variant is transductive (called Transductive-$\texttt{SubSelNet}$), which computes the subset separately for each model by solving a small optimization problem. Such an optimization is still super fast, thanks to the replacement of explicit model training by the model approximator. The second variant is inductive (called Inductive-$\texttt{SubSelNet}$), which computes the subset using a trained subset selector, without any optimization. Our experiments show that our model outperforms several methods across several real datasets.

NeurIPS Conference 2023 Conference Paper

Locality Sensitive Hashing in Fourier Frequency Domain For Soft Set Containment Search

  • Indradyumna Roy
  • Rishi Agarwal
  • Soumen Chakrabarti
  • Anirban Dasgupta
  • Abir De

In many search applications related to passage retrieval, text entailment, and subgraph search, the query and each 'document' is a set of elements, with a document being relevant if it contains the query. These elements are not represented by atomic IDs, but by embedded representations, thereby extending set containment to soft set containment. Recent applications address soft set containment by encoding sets into fixed-size vectors and checking for elementwise vector dominance. This 0/1 property can be relaxed to an asymmetric hinge distance for scoring and ranking candidate documents. Here we focus on data-sensitive, trainable indices for fast retrieval of relevant documents. Existing LSH methods are designed for mostly symmetric or few simple asymmetric distance functions, which are not suitable for hinge distance. Instead, we transform hinge distance into a proposed dominance similarity measure, to which we then apply a Fourier transform, thereby expressing dominance similarity as an expectation of inner products of functions in the frequency domain. Next, we approximate the expectation with an importance-sampled estimate. The overall consequence is that now we can use a traditional LSH, but in the frequency domain. To ensure that the LSH uses hash bits efficiently, we learn hash functions that are sensitive to both corpus and query distributions, mapped to the frequency domain. Our experiments show that the proposed asymmetric dominance similarity is critical to the targeted applications, and that our LSH, which we call FourierHashNet, provides a better query time vs. retrieval quality trade-off, compared to several baselines. Both the Fourier transform and the trainable hash codes contribute to performance gains.

AAAI Conference 2022 Conference Paper

Interpretable Neural Subgraph Matching for Graph Retrieval

  • Indradyumna Roy
  • Venkata Sai Baba Reddy Velugoti
  • Soumen Chakrabarti
  • Abir De

Given a query graph and a database of corpus graphs, a graph retrieval system aims to deliver the most relevant corpus graphs. Graph retrieval based on subgraph matching has a wide variety of applications, e. g. , molecular fingerprint detection, circuit design, software analysis, and question answering. In such applications, a corpus graph is relevant to a query graph, if the query graph is (perfectly or approximately) a subgraph of the corpus graph. Existing neural graph retrieval models compare the node or graph embeddings of the query-corpus pairs, to compute the relevance scores between them. However, such models may not provide edge consistency between the query and corpus graphs. Moreover, they predominantly use symmetric relevance scores, which are not appropriate in the context of subgraph matching, since the underlying relevance score in subgraph search should be measured using the partial order induced by subgraph-supergraph relationship. Consequently, they show poor retrieval performance in the context of subgraph matching. In response, we propose ISONET, a novel interpretable neural edge alignment formulation, which is better able to learn the edge-consistent mapping necessary for subgraph matching. ISONET incorporates a new scoring mechanism which enforces an asymmetric relevance score, specifically tailored to subgraph matching. ISONET’s design enables it to directly identify the underlying subgraph in a corpus graph, which is relevant to the given query graph. Our experiments on diverse datasets show that ISONET outperforms recent graph retrieval formulations and systems. Additionally, ISONET can provide interpretable alignments between querycorpus graph pairs during inference, despite being trained only using binary relevance labels of whole graphs during training, without any fine-grained ground truth information about node or edge alignments.

NeurIPS Conference 2022 Conference Paper

Learning Recourse on Instance Environment to Enhance Prediction Accuracy

  • Lokesh N
  • Guntakanti Sai Koushik
  • Abir De
  • Sunita Sarawagi

Machine Learning models are often susceptible to poor performance on instances sampled from bad environments. For example, an image classifier could provide low accuracy on images captured under low lighting conditions. In high stake ML applications, such as AI-driven medical diagnostics, a better option could be to provide recourse in the form of alternative environment settings in which to recapture the instance for more reliable diagnostics. In this paper, we propose a model called {\em RecourseNet} that learns to apply recourse on the space of environments so that the recoursed instances are amenable to better predictions by the classifier. Learning to output optimal recourse is challenging because we do not assume access to the underlying physical process that generates the recoursed instances. Also, the optimal setting could be instance-dependent --- for example the best camera angle for object recognition could be a function of the object's shape. We propose a novel three-level training method that (a) Learns a classifier that is optimized for high performance under recourse, (b) Learns a recourse predictor when the training data may contain only limited instances under good environment settings, and (c) Triggers recourse selectively only when recourse is likely to improve classifier confidence.

AAAI Conference 2022 Conference Paper

Learning Temporal Point Processes for Efficient Retrieval of Continuous Time Event Sequences

  • Vinayak Gupta
  • Srikanta Bedathur
  • Abir De

Recent developments in predictive modeling using marked temporal point processes (MTPP) have enabled an accurate characterization of several real-world applications involving continuous-time event sequences (CTESs). However, the retrieval problem of such sequences remains largely unaddressed in literature. To tackle this, we propose NEUROSEQRET which learns to retrieve and rank a relevant set of continuous-time event sequences for a given query sequence, from a large corpus of sequences. More specifically, NEUROSEQRET first applies a trainable unwarping function on the query sequence, which makes it comparable with corpus sequences, especially when a relevant query-corpus pair has individually different attributes. Next, it feeds the unwarped query sequence and the corpus sequence into MTPP guided neural relevance models. We develop two variants of the relevance model which offer a tradeoff between accuracy and efficiency. We also propose an optimization framework to learn binary sequence embeddings from the relevance scores, suitable for the locality-sensitive hashing leading to a significant speedup in returning top-K results for a given query sequence. Our experiments with several datasets show the significant accuracy boost of NEUROSE- QRET beyond several baselines, as well as the efficacy of our hashing mechanism.

TMLR Journal 2022 Journal Article

Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes

  • Vahid Balazadeh
  • Abir De
  • Adish Singla
  • Manuel Gomez Rodriguez

Reinforcement learning agents have been mostly developed and evaluated under the assumption that they will operate in a fully autonomous manner---they will take all actions. In this work, our goal is to develop algorithms that, by learning to switch control between agents, allow existing reinforcement learning agents to operate under different automation levels. To this end, we first formally define the problem of learning to switch control among agents in a team via a 2-layer Markov decision process. Then, we develop an online learning algorithm that uses upper confidence bounds on the agents' policies and the environment's transition probabilities to find a sequence of switching policies. The total regret of our algorithm with respect to the optimal switching policy is sublinear in the number of learning steps and, whenever multiple teams of agents operate in a similar environment, our algorithm greatly benefits from maintaining shared confidence bounds for the environments' transition probabilities and it enjoys a better regret bound than problem-agnostic algorithms. Simulation experiments in an obstacle avoidance task illustrate our theoretical findings and demonstrate that, by exploiting the specific structure of the problem, our proposed algorithm is superior to problem-agnostic algorithms.

NeurIPS Conference 2022 Conference Paper

Maximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks

  • Indradyumna Roy
  • Soumen Chakrabarti
  • Abir De

The graph retrieval problem is to search in a large corpus of graphs for ones that are most similar to a query graph. A common consideration for scoring similarity is the maximum common subgraph (MCS) between the query and corpus graphs, usually counting the number of common edges (i. e. , MCES). In some applications, it is also desirable that the common subgraph be connected, i. e. , the maximum common connected subgraph (MCCS). Finding exact MCES and MCCS is intractable, but may be unnecessary if ranking corpus graphs by relevance is the goal. We design fast and trainable neural functions that approximate MCES and MCCS well. Late interaction methods compute dense representations for the query and corpus graph separately, and compare these representations using simple similarity functions at the last stage, leading to highly scalable systems. Early interaction methods combine information from both graphs right from the input stages, are usually considerably more accurate, but slower. We propose both late and early interaction neural MCES and MCCS formulations. They are both based on a continuous relaxation of a node alignment matrix between query and corpus nodes. For MCCS, we propose a novel differentiable network for estimating the size of the largest connected common subgraph. Extensive experiments with seven data sets show that our proposals are superior among late interaction models in terms of both accuracy and speed. Our early interaction models provide accuracy competitive with the state of the art, at substantially greater speeds.

TIST Journal 2022 Journal Article

Modeling Continuous Time Sequences with Intermittent Observations using Marked Temporal Point Processes

  • Vinayak Gupta
  • Srikanta Bedathur
  • Sourangshu Bhattacharya
  • Abir De

A large fraction of data generated via human activities such as online purchases, health records, spatial mobility, etc. can be represented as a sequence of events over a continuous-time. Learning deep learning models over these continuous-time event sequences is a non-trivial task as it involves modeling the ever-increasing event timestamps, inter-event time gaps, event types, and the influences between different events within and across different sequences. In recent years, neural enhancements to marked temporal point processes (MTPP) have emerged as a powerful framework to model the underlying generative mechanism of asynchronous events localized in continuous time. However, most existing models and inference methods in the MTPP framework consider only the complete observation scenario i.e., the event sequence being modeled is completely observed with no missing events – an ideal setting that is rarely applicable in real-world applications. A recent line of work which considers missing events while training MTPP utilizes supervised learning techniques that require additional knowledge of missing or observed label for each event in a sequence, which further restricts its practicability as in several scenarios the details of missing events is not known a priori. In this work, we provide a novel unsupervised model and inference method for learning MTPP in presence of event sequences with missing events. Specifically, we first model the generative processes of observed events and missing events using two MTPP, where the missing events are represented as latent random variables. Then, we devise an unsupervised training method that jointly learns both the MTPP by means of variational inference. Such a formulation can effectively impute the missing data among the observed events, which in turn enhances its predictive prowess, and can identify the optimal position of missing events in a sequence. Experiments with eight real-world datasets show that IMTPP outperforms the state-of-the-art MTPP frameworks for event prediction and missing data imputation, and provides stable optimization.

NeurIPS Conference 2022 Conference Paper

Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection

  • Abir De
  • Soumen Chakrabarti

Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged as a key tool for data selection and summarization. Many recent approaches to learn submodular functions suffer from limited expressiveness. In this work, we propose FlexSubNet, a family of flexible neural models for both monotone and non-monotone submodular functions. To fit a latent submodular function from (set, value) observations, our method applies a concave function on modular functions in a recursive manner. We do not draw the concave function from a restricted family, but rather learn from data using a highly expressive neural network that implements a differentiable quadrature procedure. Such an expressive neural model for concave functions may be of independent interest. Next, we extend this setup to provide a novel characterization of monotone $\alpha$-submodular functions, a recently introduced notion of approximate submodular functions. We then use this characterization to design a novel neural model for such functions. Finally, we consider learning submodular set functions under distant supervision in the form of (perimeter, high-value-subset) pairs. This yields a novel subset selection method based on an order-invariant, yet greedy sampler built around the above neural set functions. Our experiments on synthetic and real data show that FlexSubNet outperforms several baselines.

ICML Conference 2022 Conference Paper

VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis

  • Tathagat Verma
  • Abir De
  • Yateesh Agrawal
  • Vishwa Vinay
  • Soumen Chakrabarti

Scene graphs are powerful abstractions that capture relationships between objects in images by modeling objects as nodes and relationships as edges. Generation of realistic synthetic scene graphs has applications like scene synthesis and data augmentation for supervised learning. Existing graph generative models are predominantly targeted toward molecular graphs, leveraging the limited vocabulary of atoms and bonds and also the well-defined semantics of chemical compounds. In contrast, scene graphs have much larger object and relation vocabularies, and their semantics are latent. To address this challenge, we propose a variational autoencoder for scene graphs, which is optimized for the maximum mean discrepancy (MMD) between the ground truth scene graph distribution and distribution of the generated scene graphs. Our method views a scene graph as a collection of star graphs and encodes it into a latent representation of the underlying stars. The decoder generates scene graphs by learning to sample the component stars and edges between them. Our experiments show that our method is able to mimic the underlying scene graph generative process more accurately than several state-of-the-art baselines.

AAAI Conference 2021 Conference Paper

Adversarial Permutation Guided Node Representations for Link Prediction

  • Indradyumna Roy
  • Abir De
  • Soumen Chakrabarti

After observing a snapshot of a social network, a link prediction (LP) algorithm identifies node pairs between which new edges will likely materialize in future. Most LP algorithms estimate a score for currently non-neighboring node pairs, and rank them by this score. Recent LP systems compute this score by comparing dense, low dimensional vector representations of nodes. Graph neural networks (GNNs), in particular graph convolutional networks (GCNs), are popular examples. For two nodes to be meaningfully compared, their embeddings should be indifferent to reordering of their neighbors. GNNs typically use simple, symmetric set aggregators to ensure this property, but this design decision has been shown to produce representations with limited expressive power. Sequence encoders are more expressive, but are permutation sensitive by design. Recent efforts to overcome this dilemma turn out to be unsatisfactory for LP tasks. In response, we propose PERMGNN, which aggregates neighbor features using a recurrent, order-sensitive aggregator and directly minimizes an LP loss while it is ‘attacked’ by adversarial generator of neighbor permutations. PERMGNN has superior expressive power compared to earlier GNNs. Next, we devise an optimization framework to map PERMGNN’s node embeddings to a suitable locality-sensitive hash, which speeds up reporting the top-K most likely edges for the LP task. Our experiments on diverse datasets show that PERM- GNN outperforms several state-of-the-art link predictors, and can predict the most likely edges fast.

AAAI Conference 2021 Conference Paper

Classification Under Human Assistance

  • Abir De
  • Nastaran Okati
  • Ali Zarezade
  • Manuel Gomez Rodriguez

Most supervised learning models are trained for full automation. However, their predictions are sometimes worse than those by human experts on some specific instances. Motivated by this empirical observation, our goal is to design classifiers that are optimized to operate under different automation levels. More specifically, we focus on convex margin-based classifiers and first show that the problem is NP-hard. Then, we further show that, for support vector machines, the corresponding objective function can be expressed as the difference of two functions f = g − c, where g is monotone, non-negative and γ-weakly submodular, and c is non-negative and modular. This representation allows us to utilize a recently introduced deterministic greedy algorithm, as well as a more efficient randomized variant of the algorithm, which enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from several applications in medical diagnosis illustrate our theoretical findings and demonstrate that, under human assistance, supervised learning models trained to operate under different automation levels can outperform those trained for full automation as well as humans operating alone.

NeurIPS Conference 2021 Conference Paper

Counterfactual Explanations in Sequential Decision Making Under Uncertainty

  • Stratis Tsirtsis
  • Abir De
  • Manuel Rodriguez

Methods to find counterfactual explanations have predominantly focused on one-step decision making processes. In this work, we initiate the development of methods to find counterfactual explanations for decision making processes in which multiple, dependent actions are taken sequentially over time. We start by formally characterizing a sequence of actions and states using finite horizon Markov decision processes and the Gumbel-Max structural causal model. Building upon this characterization, we formally state the problem of finding counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions from the observed sequence that could have led the observed process realization to a better outcome. Then, we introduce a polynomial time algorithm based on dynamic programming to build a counterfactual policy that is guaranteed to always provide the optimal counterfactual explanation on every possible realization of the counterfactual environment dynamics. We validate our algorithm using both synthetic and real data from cognitive behavioral therapy and show that the counterfactual explanations our algorithm finds can provide valuable insights to enhance sequential decision making under uncertainty.

NeurIPS Conference 2021 Conference Paper

Differentiable Learning Under Triage

  • Nastaran Okati
  • Abir De
  • Manuel Rodriguez

Multiple lines of evidence suggest that predictive models may benefit from algorithmic triage. Under algorithmic triage, a predictive model does not predict all instances but instead defers some of them to human experts. However, the interplay between the prediction accuracy of the model and the human experts under algorithmic triage is not well understood. In this work, we start by formally characterizing under which circumstances a predictive model may benefit from algorithmic triage. In doing so, we also demonstrate that models trained for full automation may be suboptimal under triage. Then, given any model and desired level of triage, we show that the optimal triage policy is a deterministic threshold rule in which triage decisions are derived deterministically by thresholding the difference between the model and human errors on a per-instance level. Building upon these results, we introduce a practical gradient-based algorithm that is guaranteed to find a sequence of predictive models and triage policies of increasing performance. Experiments on a wide variety of supervised learning tasks using synthetic and real data from two important applications---content moderation and scientific discovery---illustrate our theoretical results and show that the models and triage policies provided by our gradient-based algorithm outperform those provided by several competitive baselines.

AAAI Conference 2021 Conference Paper

Differentially Private Link Prediction with Protected Connections

  • Abir De
  • Soumen Chakrabarti

Link prediction (LP) algorithms propose to each node a ranked list of nodes that are currently non-neighbors, as the most likely candidates for future linkage. Owing to increasing concerns about privacy, users (nodes) may prefer to keep some of their connections protected or private. Motivated by this observation, our goal is to design a differentially private LP algorithm, which trades off between privacy of the protected node-pairs and the link prediction accuracy. More specifically, we first propose a form of differential privacy on graphs, which models the privacy loss only of those nodepairs which are marked as protected. Next, we develop DPLP, a learning to rank algorithm, which applies a monotone transform to base scores from a non-private LP system, and then adds noise. DPLP is trained with a privacy induced ranking loss, which optimizes the ranking utility for a given maximum allowed level of privacy leakage of the protected node-pairs. Under a recently-introduced latent node embedding model, we present a formal trade-off between privacy and LP utility. Extensive experiments with several real-life graphs and several LP heuristics show that DPLP can trade off between privacy and predictive performance more effectively than several alternatives.

ICML Conference 2021 Conference Paper

GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training

  • Krishnateja Killamsetty
  • Durga Sivasubramanian
  • Ganesh Ramakrishnan
  • Abir De
  • Rishabh K. Iyer

The great success of modern machine learning models on large datasets is contingent on extensive computational resources with high financial and environmental costs. One way to address this is by extracting subsets that generalize on par with the full data. In this work, we propose a general framework, GRAD-MATCH, which finds subsets that closely match the gradient of the \emph{training or validation} set. We find such subsets effectively using an orthogonal matching pursuit algorithm. We show rigorous theoretical and convergence guarantees of the proposed algorithm and, through our extensive experiments on real-world datasets, show the effectiveness of our proposed framework. We show that GRAD-MATCH significantly and consistently outperforms several recent data-selection algorithms and achieves the best accuracy-efficiency trade-off. GRAD-MATCH is available as a part of the CORDS toolkit: \url{https: //github. com/decile-team/cords}.

NeurIPS Conference 2021 Conference Paper

Learning to Select Exogenous Events for Marked Temporal Point Process

  • Ping Zhang
  • Rishabh Iyer
  • Ashish Tendulkar
  • Gaurav Aggarwal
  • Abir De

Marked temporal point processes (MTPPs) have emerged as a powerful modelingtool for a wide variety of applications which are characterized using discreteevents localized in continuous time. In this context, the events are of two typesendogenous events which occur due to the influence of the previous events andexogenous events which occur due to the effect of the externalities. However, inpractice, the events do not come with endogenous or exogenous labels. To thisend, our goal in this paper is to identify the set of exogenous events from a set ofunlabelled events. To do so, we first formulate the parameter estimation problemin conjunction with exogenous event set selection problem and show that thisproblem is NP hard. Next, we prove that the underlying objective is a monotoneand \alpha-submodular set function, with respect to the candidate set of exogenousevents. Such a characterization subsequently allows us to use a stochastic greedyalgorithm which was originally proposed in~\cite{greedy}for submodular maximization. However, we show that it also admits an approximation guarantee for maximizing\alpha-submodular set function, even when the learning algorithm provides an imperfectestimates of the trained parameters. Finally, our experiments with synthetic andreal data show that our method performs better than the existing approaches builtupon superposition of endogenous and exogenous MTPPs.

ICML Conference 2021 Conference Paper

Training Data Subset Selection for Regression with Controlled Generalization Error

  • Durga Sivasubramanian
  • Rishabh K. Iyer
  • Ganesh Ramakrishnan
  • Abir De

Data subset selection from a large number of training instances has been a successful approach toward efficient and cost-effective machine learning. However, models trained on a smaller subset may show poor generalization ability. In this paper, our goal is to design an algorithm for selecting a subset of the training data, so that the model can be trained quickly, without significantly sacrificing on accuracy. More specifically, we focus on data subset selection for $L_2$ regularized regression problems and provide a novel problem formulation which seeks to minimize the training loss with respect to both the trainable parameters and the subset of training data, subject to error bounds on the validation set. We tackle this problem using several technical innovations. First, we represent this problem with simplified constraints using the dual of the original training problem and show that the objective of this new representation is a monotone and $\alpha$-submodular function, for a wide variety of modeling choices. Such properties lead us to develop SELCON, an efficient majorization-minimization algorithm for data subset selection, that admits an approximation guarantee even when the training provides an imperfect estimate of the trained model. Finally, our experiments on several datasets show that SELCON trades off accuracy and efficiency more effectively than the current state-of-the-art.

NeurIPS Conference 2021 Conference Paper

Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time

  • Anshul Nasery
  • Soumyadeep Thakur
  • Vihari Piratla
  • Abir De
  • Sunita Sarawagi

In several real world applications, machine learning models are deployed to make predictions on data whose distribution changes gradually along time, leading to a drift between the train and test distributions. Such models are often re-trained on new data periodically, and they hence need to generalize to data not too far into the future. In this context, there is much prior work on enhancing temporal generalization, e. g. continuous transportation of past data, kernel smoothed time-sensitive parameters and more recently, adversarial learning of time-invariant features. However, these methods share several limitations, e. g, poor scalability, training instability, and dependence on unlabeled data from the future. Responding to the above limitations, we propose a simple method that starts with a model with time-sensitive parameters but regularizes its temporal complexity using a Gradient Interpolation (GI) loss. GI allows the decision boundary to change along time and can still prevent overfitting to the limited training time snapshots by allowing task-specific control over changes along time. We compare our method to existing baselines on multiple real-world datasets, which show that GI outperforms more complicated generative and adversarial approaches on the one hand, and simpler gradient regularization methods on the other.

JMLR Journal 2020 Journal Article

NEVAE: A Deep Generative Model for Molecular Graphs

  • Bidisha Samanta
  • Abir De
  • Gourhari Jana
  • Vicenç Gómez
  • Pratim Chattaraj
  • Niloy Ganguly
  • Manuel Gomez-Rodriguez

Deep generative models have been praised for their ability to learn smooth latent representations of images, text, and audio, which can then be used to generate new, plausible data. Motivated by these success stories, there has been a surge of interest in developing deep generative models for automated molecule design. However, these models face several difficulties due to the unique characteristics of molecular graphs—their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes’ labels, and they come with a different number of nodes and edges. In this paper, we first propose a novel variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. Moreover, in contrast with the state of the art, our decoder is able to provide the spatial coordinates of the atoms of the molecules it generates. Then, we develop a gradient-based algorithm to optimize the decoder of our model so that it learns to generate molecules that maximize the value of certain property of interest and, given any arbitrary molecule, it is able to optimize the spatial configuration of its atoms for greater stability. Experiments reveal that our variational autoencoder can discover plausible, diverse and novel molecules more effectively than several state of the art models. Moreover, for several properties of interest, our optimized decoder is able to identify molecules with property values 121% higher than those identified by several state of the art methods based on Bayesian optimization and reinforcement learning. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

UAI Conference 2020 Conference Paper

On the design of consequential ranking algorithms

  • Behzad Tabibian
  • Vicenç Gómez
  • Abir De
  • Bernhard Schölkopf
  • Manuel Gomez Rodriguez

Ranking models are typically designed to optimize some measure of immediate utility to the users. As a result, they have been unable to anticipate an increasing number of undesirable long-term consequences of their proposed rankings, from fueling the spread of misinformation and increasing polarization to degrading social discourse. Can we design ranking models that anticipate the consequences of their proposed rankings and are able to avoid the undesirable ones? In this paper, we first introduce a joint representation of rankings and user dynamics using Markov decision processes. Then, we show that this representation greatly simplifies the construction of consequential ranking models that trade off theimmediate utility and the long-term welfare. In particular, we can obtain optimal consequential rankings by applying weighted sampling on the rankings provided by models that maximize measures of immediate utility. However, in practice, such a strategy may be inefficient and impractical, specially in high dimensional scenarios. To overcome this, we introduce an efficient gradient-based algorithm to learn parameterized consequential ranking models that effectively approximate optimal ones. We illustrate our methodology using synthetic and real data gathered from Reddit and show that our consequential rankings may mitigate the spread of misinformation and improve the civility of online discussions.

AAAI Conference 2020 Conference Paper

Regression under Human Assistance

  • Abir De
  • Paramita Koley
  • Niloy Ganguly
  • Manuel Gomez-Rodriguez

Decisions are increasingly taken by both humans and machine learning models. However, machine learning models are currently trained for full automation—they are not aware that some of the decisions may still be taken by humans. In this paper, we take a first step towards the development of machine learning models that are optimized to operate under different automation levels. More specifically, we first introduce the problem of ridge regression under human assistance and show that it is NP-hard. Then, we derive an alternative representation of the corresponding objective function as a difference of nondecreasing submodular functions. Building on this representation, we further show that the objective is nondecreasing and satisfies α-submodularity, a recently introduced notion of approximate submodularity. These properties allow a simple and efficient greedy algorithm to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from two important applications—medical diagnosis and content moderation— demonstrate that the greedy algorithm beats several competitive baselines.

AAAI Conference 2019 Conference Paper

NeVAE: A Deep Generative Model for Molecular Graphs

  • Bidisha Samanta
  • Abir De
  • Gourhari Jana
  • Pratim Kumar Chattaraj
  • Niloy Ganguly
  • Manuel Gomez Rodriguez

Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with molecular graphs due to their unique characteristics—their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In this paper, we propose NeVAE, a novel variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. In addition, by using masking, the decoder is able to guarantee a set of valid properties in the generated molecules. Experiments reveal that our model can discover plausible, diverse and novel molecules more effectively than several state of the art methods. Moreover, by utilizing Bayesian optimization over the continuous latent representation of molecules our model finds, we can also find molecules that maximize certain desirable properties more effectively than alternatives.

NeurIPS Conference 2018 Conference Paper

Deep Reinforcement Learning of Marked Temporal Point Processes

  • Utkarsh Upadhyay
  • Abir De
  • Manuel Gomez Rodriguez

In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in viral marketing and personalized teaching and, using data gathered from Twitter and Duolingo, we show that it may be able to find interventions to help marketers and learners achieve their goals more effectively than alternatives.

AAMAS Conference 2018 Conference Paper

Shaping Opinion Dynamics in Social Networks

  • Abir De
  • Sourangshu Bhattacharya
  • Niloy Ganguly

A networked opinion diffusion process that usually involves extensive spontaneous discussions between connected users, is often propelled by external sources of news or feeds recommended to them. In many applications like marketing design, or product launch, etc. , corporations often post curated news or feeds on social media in order to steer the users’ opinions in a desired way. We call such scenarios as opinion shaping or opinion control whereby a few select users called control users post opinionated messages to drive the others’ opinions to reach a given state. In this paper, we propose SmartShape, an opinion control package that jointly selects the control users, as well as computes the optimum rate of control messages, thereby driving the networked opinion dynamics to the desired direction. Furthermore, our proposal also includes a robust shaping suit which makes our control framework resilient to stochastic fluctuations of opinion dynamics, originating from several sources of randomness. Experiments on several synthetic and real datasets gathered from Twitter, show that SmartShape can accurately determine the quality of a set of control users as well as shape the opinion dynamics more effectively than several baselines.

JMLR Journal 2018 Journal Article

Steering Social Activity: A Stochastic Optimal Control Point Of View

  • Ali Zarezade
  • Abir De
  • Utkarsh Upadhyay
  • Hamid R. Rabiee
  • Manuel Gomez-Rodriguez

User engagement in online social networking depends critically on the level of social activity in the corresponding platform---the number of online actions, such as posts, shares or replies, taken by their users. Can we design data-driven algorithms to increase social activity? At a user level, such algorithms may increase activity by helping users decide when to take an action to be more likely to be noticed by their peers. At a network level, they may increase activity by incentivizing a few influential users to take more actions, which in turn will trigger additional actions by other users. In this paper, we model social activity using the framework of marked temporal point processes, derive an alternate representation of these processes using stochastic differential equations (SDEs) with jumps and, exploiting this alternate representation, develop two efficient online algorithms with provable guarantees to steer social activity both at a user and at a network level. In doing so, we establish a previously unexplored connection between optimal control of jump SDEs and doubly stochastic marked temporal point processes, which is of independent interest. Finally, we experiment both with synthetic and real data gathered from Twitter and show that our algorithms consistently steer social activity more effectively than the state of the art. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

IJCAI Conference 2017 Conference Paper

LMPP: A Large Margin Point Process Combining Reinforcement and Competition for Modeling Hashtag Popularity

  • Bidisha Samanta
  • Abir De
  • Abhijnan Chakraborty
  • Niloy Ganguly

Predicting the popularity dynamics of Twitter hashtags has a broad spectrum of applications. Existing works have mainly focused on modeling the popularity of individual tweets rather than the popularity of the underlying hashtags. Hence, they do not consider several realistic factors for hashtag popularity. In this paper, we propose Large Margin Point Process (LMPP), a probabilistic framework that integrates hashtag-tweet influence and hashtag-hashtag competitions, the two factors which play important roles in hashtag propagation. Furthermore, while considering the hashtag competitions, LMPP looks into the variations of popularity rankings of the competing hashtags across time. Extensive experiments on seven real datasets demonstrate that LMPP outperforms existing popularity prediction approaches by a significant margin. Going further, LMPP can accurately predict the relative rankings of competing hashtags, offering additional advantage over the state-of-the-art baselines.

NeurIPS Conference 2016 Conference Paper

Learning and Forecasting Opinion Dynamics in Social Networks

  • Abir De
  • Isabel Valera
  • Niloy Ganguly
  • Sourangshu Bhattacharya
  • Manuel Gomez Rodriguez

Social media and social networking sites have become a global pinboard for exposition and discussion of news, topics, and ideas, where social media users often update their opinions about a particular topic by learning from the opinions shared by their friends. In this context, can we learn a data-driven model of opinion dynamics that is able to accurately forecast users' opinions? In this paper, we introduce SLANT, a probabilistic modeling framework of opinion dynamics, which represents users' opinions over time by means of marked jump diffusion stochastic differential equations, and allows for efficient model simulation and parameter estimation from historical fine grained event data. We then leverage our framework to derive a set of efficient predictive formulas for opinion forecasting and identify conditions under which opinions converge to a steady state. Experiments on data gathered from Twitter show that our model provides a good fit to the data and our formulas achieve more accurate forecasting than alternatives.