Arrow Research search

Author name cluster

Ryan A. Rossi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

AAAI Conference 2026 Conference Paper

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

  • Zhehao Zhang
  • Ryan A. Rossi
  • Tong Yu
  • Franck Dernoncourt
  • Ruiyi Zhang
  • Jiuxiang Gu
  • Sungchul Kim
  • Xiang Chen

While vision-language models (VLMs) have demonstrated remarkable performance across various tasks combining textual and visual information, they continue to struggle with fine-grained visual perception tasks that require detailed pixel-level analysis. Effectively eliciting comprehensive reasoning from VLMs on such intricate visual elements remains an open challenge. In this paper, we present VipAct, an agent framework that enhances VLMs by integrating multi-agent collaboration and vision expert models, enabling more precise visual understanding and comprehensive reasoning. VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks such as image captioning and vision expert models that provide high-precision perceptual information. This multi-agent approach allows VLMs to better perform fine-grained visual perception tasks by synergizing planning, reasoning, and tool use. We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements over state-of-the-art baselines across all tasks. Furthermore, comprehensive ablation studies reveal the critical role of multi-agent collaboration in eliciting more detailed System-2 reasoning and highlight the importance of image input for task planning. Additionally, our error analysis identifies patterns of VLMs' inherent limitations in visual perception, providing insights into potential future improvements. VipAct offers a flexible and extensible framework, paving the way for more advanced visual perception systems across various real-world applications.

ICLR Conference 2025 Conference Paper

A Large-scale Training Paradigm for Graph Generative Models

  • Yu Wang 0160
  • Ryan A. Rossi
  • Namyong Park 0001
  • Huiyuan Chen
  • Nesreen K. Ahmed
  • Puja Trivedi
  • Franck Dernoncourt
  • Danai Koutra

Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of texts, images, videos, and audio that are extremely diverse from numerous domains. This large-scale training paradigm on diverse well-curated data enhances the creativity and diversity of the generated content. However, all previous graph-generative models (e.g., GraphRNN, MDVAE, MoFlow, GDSS, and DiGress) have been trained only on one dataset each time, which cannot replicate the revolutionary success achieved by LGMs in other fields. To remedy this crucial gap, we propose a large-scale training paradigm that uses a large corpus of graphs (over 5000 graphs) from 13 domains, leading to the development of large graph generative models (LGGMs). We empirically demonstrate that the pre-trained LGGMs have superior zero-shot generative capability to existing graph generative models. Furthermore, our pre-trained LGGMs can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch, behaving as a solid starting point for real-world customization. Inspired by Stable Diffusion, we further equip LGGMs with the Text-to-Graph generation capability, such as providing the description of the network name and domain (i.e., "The power-1138-bus graph represents a network of buses in a power distribution system.") and network statistics (i.e., "The graph has a low average degree, suitable for modeling social media interactions."). This Text-to-Graph capability integrates the extensive world knowledge in the underlying language model, offering users fine-grained control of the generated graphs. We release the code, the model checkpoint, and the datasets at https://github.com/KINDLab-Fly/LGGM.

TMLR Journal 2025 Journal Article

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

  • Leitian Tao
  • Xiang Chen
  • Tong Yu
  • Tung Mai
  • Ryan A. Rossi
  • Yixuan Li
  • Saayan Mitra

Large Language Models (LLMs) have revolutionized code generation but are require significant resources and tend to over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs is a cost-effective alternative, yet standard supervised approaches rely solely on correct examples, overlooking valuable insights from failures. We introduce CodeLutra, a new framework that leverages both correct and incorrect code attempts. Instead of purely instructing with correct solutions, CodeLutra uses iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This process narrows the performance gap with state-of-the-art, larger models, without requiring massive datasets or auxiliary models. For example, on a challenging data science coding task, using only 500 samples improved Llama-3-8B’s accuracy from 28.2% to 48.6%, approaching GPT-4’s level. By capitalizing on both successes and mistakes, \textsc{CodeLutra} offers a scalable, efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.

ICML Conference 2025 Conference Paper

Fully Dynamic Embedding into ℓp Spaces

  • Kiarash Banihashem
  • Xiang Chen 0010
  • MohammadTaghi Hajiaghayi
  • Sungchul Kim
  • Kanak Mahadik
  • Ryan A. Rossi
  • Tong Yu 0001

Metric embeddings are fundamental in machine learning, enabling similarity search, dimensionality reduction, and representation learning. They underpin modern architectures like transformers and large language models, facilitating scalable training and improved generalization. Theoretically, the classic problem in embedding design is mapping arbitrary metrics into $\ell_p$ spaces while approximately preserving pairwise distances. We study this problem in a fully dynamic setting, where the underlying metric is a graph metric subject to edge insertions and deletions. Our goal is to maintain an efficient embedding after each update. We present the first fully dynamic algorithm for this problem, achieving $O(\log(n))^{2q} O(\log(nW))^{q-1}$ expected distortion with $O(m^{1/q + o(1)})$ update time and $O(q \log(n) \log(nW))$ query time, where $q \ge 2$ is an integer parameter.

ICML Conference 2025 Conference Paper

NoLiMa: Long-Context Evaluation Beyond Literal Matching

  • Ali Modarressi
  • Hanieh Deilamsalehy
  • Franck Dernoncourt
  • Trung Bui
  • Ryan A. Rossi
  • Seunghyun Yoon 0002
  • Hinrich Schütze

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a "needle" (relevant information) from a "haystack" (long irrelevant context). Extensions of this approach include increasing distractors, fact chaining, and in-context reasoning. However, in these benchmarks, models can exploit existing literal matches between the needle and haystack to simplify the task. To address this, we introduce NoLiMa, a benchmark extending NIAH with a carefully designed needle set, where questions and needles have minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack. We evaluate 13 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts ($<$1K), performance degrades significantly as context length increases. At 32K, for instance, 11 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99. 3% to 69. 7%. Our analysis suggests these declines stem from the increased difficulty the attention mechanism faces in longer contexts when literal matches are absent, making it harder to retrieve relevant information. Even models enhanced with reasoning capabilities or CoT prompting struggle to maintain performance in long contexts. We publicly release the dataset and evaluation code at https: //github. com/adobe-research/NoLiMa.

AAAI Conference 2025 Conference Paper

Numerical Pruning for Efficient Autoregressive Models

  • Xuan Shen
  • Zhao Song
  • Yufa Zhou
  • Bo Chen
  • Jing Liu
  • Ruiyi Zhang
  • Ryan A. Rossi
  • Hao Tan

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high computational costs due to their substantial model size. This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning to improve the model efficiency while preserving performance for both language and image generation tasks. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and MLP modules, respectively. Besides, we further propose another compensation algorithm to recover the pruned model for better performance. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs.

TMLR Journal 2025 Journal Article

Personalization of Large Language Models: A Survey

  • Zhehao Zhang
  • Ryan A. Rossi
  • Branislav Kveton
  • Yijia Shao
  • Diyi Yang
  • Hamed Zamani
  • Franck Dernoncourt
  • Joe Barrow

Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we bridge the gap between these two separate main directions for the first time by introducing a taxonomy for personalized LLM usage and summarizing the key differences and challenges. We provide a formalization of the foundations of personalized LLMs that consolidates and expands notions of personalization of LLMs, defining and discussing novel facets of personalization, usage, and desiderata of personalized LLMs. We then unify the literature across these diverse fields and usage scenarios by proposing systematic taxonomies for the granularity of personalization, personalization techniques, datasets, evaluation methods, and applications of personalized LLMs. Finally, we highlight challenges and important open problems that remain to be addressed. By unifying and surveying recent research using the proposed taxonomies, we aim to provide a clear guide to the existing literature and different facets of personalization in LLMs, empowering both researchers and practitioners.

ICLR Conference 2025 Conference Paper

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding

  • Jian Chen 0043
  • Ruiyi Zhang 0002
  • Yufan Zhou 0001
  • Tong Yu 0001
  • Franck Dernoncourt
  • Jiuxiang Gu
  • Ryan A. Rossi
  • Changyou Chen

Multimodal large language models (MLLMs) have recently shown great progress in text-rich image understanding, yet they still struggle with complex, multi-page visually-rich documents. Traditional methods using document parsers for retrieval-augmented generation suffer from performance and efficiency limitations, while directly presenting all pages to MLLMs leads to inefficiencies, especially with lengthy ones. In this work, we present a novel framework named **S**elf-**V**isual **R**etrieval-**A**ugmented **G**eneration (SV-RAG), which can broaden horizons of *any* MLLM to support long-document understanding. We demonstrate that **MLLMs themselves can be an effective multimodal retriever** to fetch relevant pages and then answer user questions based on these pages. SV-RAG is implemented with two specific MLLM adapters, one for evidence page retrieval and the other for question answering. Empirical results show state-of-the-art performance on public benchmarks, demonstrating the effectiveness of SV-RAG.

ICLR Conference 2025 Conference Paper

Towards Optimal Multi-draft Speculative Decoding

  • Zhengmian Hu
  • Tong Zheng
  • Vignesh Viswanathan
  • Ziyi Chen 0002
  • Ryan A. Rossi
  • Yihan Wu
  • Dinesh Manocha
  • Heng Huang 0001

Large Language Models (LLMs) have become an indispensable part of natural language processing tasks. However, autoregressive sampling has become an efficiency bottleneck. Multi-Draft Speculative Decoding (MDSD) is a recent approach where, when generating each token, a small draft model generates multiple drafts, and the target LLM verifies them in parallel, ensuring that the final output conforms to the target model distribution. The two main design choices in MDSD are the draft sampling method and the verification algorithm. For a fixed draft sampling method, the optimal acceptance rate is a solution to an optimal transport problem, but the complexity of this problem makes it difficult to solve for the optimal acceptance rate and measure the gap between existing verification algorithms and the theoretical upper bound. This paper discusses the dual of the optimal transport problem, providing a way to efficiently compute the optimal acceptance rate. For the first time, we measure the theoretical upper bound of MDSD efficiency for vocabulary sizes in the thousands and quantify the gap between existing verification algorithms and this bound. We also compare different draft sampling methods based on their optimal acceptance rates. Our results show that the draft sampling method strongly influences the optimal acceptance rate, with sampling without replacement outperforming sampling with replacement. Additionally, existing verification algorithms do not reach the theoretical upper bound for both without replacement and with replacement sampling. Our findings suggest that carefully designed draft sampling methods can potentially improve the optimal acceptance rate and enable the development of verification algorithms that closely match the theoretical upper bound.

ICML Conference 2024 Conference Paper

Continuous Treatment Effects with Surrogate Outcomes

  • Zhenghao Zeng
  • David Arbour
  • Avi Feller
  • Raghavendra Addanki
  • Ryan A. Rossi
  • Ritwik Sinha
  • Edward H. Kennedy

In many real-world causal inference applications, the primary outcomes (labels) are often partially missing, especially if they are expensive or difficult to collect. If the missingness depends on covariates (i. e. , missingness is not completely at random), analyses based on fully observed samples alone may be biased. Incorporating surrogates, which are fully observed post-treatment variables related to the primary outcome, can improve estimation in this case. In this paper, we study the role of surrogates in estimating continuous treatment effects and propose a doubly robust method to efficiently incorporate surrogates in the analysis, which uses both labeled and unlabeled data and does not suffer from the above selection bias problem. Importantly, we establish the asymptotic normality of the proposed estimator and show possible improvements on the variance compared with methods that solely use labeled data. Extensive simulations show our methods enjoy appealing empirical performance.

ICML Conference 2024 Conference Paper

Editing Partially Observable Networks via Graph Diffusion Models

  • Puja Trivedi
  • Ryan A. Rossi
  • David Arbour
  • Tong Yu 0001
  • Franck Dernoncourt
  • Sungchul Kim
  • Nedim Lipka
  • Namyong Park 0001

Most real-world networks are noisy and incomplete samples from an unknown target distribution. Refining them by correcting corruptions or inferring unobserved regions typically improves downstream performance. Inspired by the impressive generative capabilities that have been used to correct corruptions in images, and the similarities between "in-painting" and filling in missing nodes and edges conditioned on the observed graph, we propose a novel graph generative framework, SGDM, which is based on subgraph diffusion. Our framework not only improves the scalability and fidelity of graph diffusion models, but also leverages the reverse process to perform novel, conditional generation tasks. In particular, through extensive empirical analysis and a set of novel metrics, we demonstrate that our proposed model effectively supports the following refinement tasks for partially observable networks: (T1) denoising extraneous subgraphs, (T2) expanding existing subgraphs and (T3) performing “style" transfer by regenerating a particular subgraph to match the characteristics of a different node or subgraph.

ICLR Conference 2024 Conference Paper

Forward Learning of Graph Neural Networks

  • Namyong Park 0001
  • Xing Wang
  • Antoine Simoulin
  • Shuai Yang
  • Grey Yang
  • Ryan A. Rossi
  • Puja Trivedi
  • Nesreen K. Ahmed

Graph neural networks (GNNs) have achieved remarkable success across a wide range of applications, such as recommendation, drug discovery, and question answering. Behind the success of GNNs lies the backpropagation (BP) algorithm, which is the de facto standard for training deep neural networks (NNs). However, despite its effectiveness, BP imposes several constraints, which are not only biologically implausible, but also limit the scalability, parallelism, and flexibility in learning NNs. Examples of such constraints include storage of neural activities computed in the forward pass for use in the subsequent backward pass, and the dependence of parameter updates on non-local signals. To address these limitations, the forward-forward algorithm (FF) was recently proposed as an alternative to BP in the image classification domain, which trains NNs by performing two forward passes over positive and negative data. Inspired by this advance, we propose ForwardGNN in this work, a new forward learning procedure for GNNs, which avoids the constraints imposed by BP via an effective layer-wise local forward training. ForwardGNN extends the original FF to deal with graph data and GNNs, and makes it possible to operate without generating negative inputs (hence no longer forward-forward). Further, ForwardGNN enables each layer to learn from both the bottom-up and top-down signals without relying on the backpropagation of errors. Extensive experiments on real-world datasets show the effectiveness and generality of the proposed forward graph learning framework. We release our code at https://github.com/facebookresearch/forwardgnn.

AAAI Conference 2024 Conference Paper

Knowledge Graph Prompting for Multi-Document Question Answering

  • Yu Wang
  • Nedim Lipka
  • Ryan A. Rossi
  • Alexa Siu
  • Ruiyi Zhang
  • Tyler Derr

The `pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. For graph construction, we create a knowledge graph (KG) over multiple documents with nodes symbolizing passages or document structures (e.g., pages/tables), and edges denoting the semantic/lexical similarity between passages or document structural relations. For graph traversal, we design an LLM-based graph traversal agent that navigates across nodes and gathers supporting passages assisting LLMs in MD-QA. The constructed graph serves as the global ruler that regulates the transitional space among passages and reduces retrieval latency. Concurrently, the graph traversal agent acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality. Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design and retrieval augmented generation for LLMs. Our code: https://github.com/YuWVandy/KG-LLM-MDQA.

ICML Conference 2024 Conference Paper

On Mechanistic Knowledge Localization in Text-to-Image Generative Models

  • Samyadeep Basu
  • Keivan Rezaei
  • Priyatham Kattakinda
  • Vlad I. Morariu
  • Nanxuan Zhao
  • Ryan A. Rossi
  • Varun Manjunatha
  • Soheil Feizi

Identifying layers within text-to-image models which control visual attributes can facilitate efficient model editing through closed-form updates. Recent work, leveraging causal tracing show that early Stable-Diffusion variants confine knowledge primarily to the first layer of the CLIP text-encoder, while it diffuses throughout the UNet. Extending this framework, we observe that for recent models (e. g. , SD-XL, DeepFloyd), causal tracing fails in pinpointing localized knowledge, highlighting challenges in model editing. To address this issue, we introduce the concept of mechanistic localization in text-to-image models, where knowledge about various visual attributes (e. g. , "style", "objects", "facts") can be mechanistically localized to a small fraction of layers in the UNet, thus facilitating efficient model editing. We localize knowledge using our method LocoGen which measures the direct effect of intermediate layers to output generation by performing interventions in the cross-attention layers of the UNet. We then employ LocoEdit, a fast closed-form editing method across popular open-source text-to-image models (including the latest SD-XL) and explore the possibilities of neuron-level model editing. Using mechanistic localization, our work offers a better view of successes and failures in localization-based text-to-image model editing.

JMLR Journal 2024 Journal Article

Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model

  • Rashmi Ranjan Bhuyan
  • Adel Javanmard
  • Sungchul Kim
  • Gourab Mukherjee
  • Ryan A. Rossi
  • Tong Yu
  • Handong Zhao

We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic model with the consumers’ preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers’ preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal setup, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up. We conduct simulation experiments across a wide range of regimes as well as real-world networks based studies and report encouraging performance for our proposed method. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

ICLR Conference 2024 Conference Paper

ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search

  • Yuchen Zhuang
  • Xiang Chen 0010
  • Tong Yu 0001
  • Saayan Mitra
  • Victor S. Bursztyn
  • Ryan A. Rossi
  • Somdeb Sarkhel
  • Chao Zhang 0014

Large language models (LLMs) have demonstrated powerful decision-making and planning capabilities in solving complicated real-world problems. LLM-based autonomous agents can interact with diverse tools (e.g., functional APIs) and generate solution plans that execute a series of API function calls in a step-by-step manner. The multitude of candidate API function calls significantly expands the action space, amplifying the critical need for efficient action space navigation. However, existing methods either struggle with unidirectional exploration in expansive action spaces, trapped into a locally optimal solution, or suffer from exhaustively traversing all potential actions, causing inefficient navigation. To address these issues, we propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan. By incorporating the A$^*$ search algorithm with task-specific cost function design, it efficiently prunes high-cost branches that may involve incorrect actions, identifying the most low-cost valid path as the solution. Extensive experiments on multiple tool-use and reasoning tasks demonstrate that ToolChain* efficiently balances exploration and exploitation within an expansive action space. It outperforms state-of-the-art baselines on planning and reasoning tasks by 3.1% and 3.5% on average while requiring 7.35x and 2.31x less time, respectively.

ICML Conference 2023 Conference Paper

A Model-free Closeness-of-influence Test for Features in Supervised Learning

  • Mohammad Mehrabi
  • Ryan A. Rossi

Understanding the effect of a feature vector $x\in \mathbb{R}^d$ on the response value (label) $y\in \mathbb{R}$ is the cornerstone of many statistical learning problems. Ideally, it is desired to understand how a set of collected features combine together and influence the response value, but this problem is notoriously difficult, due to the high-dimensionality of data and limited number of labeled data points, among many others. In this work, we take a new perspective on this problem, and we study the question of assessing the difference of influence that the two given features have on the response value. We first propose a notion of closeness for the influence of features, and show that our definition recovers the familiar notion of the magnitude of coefficients in the parametric model. We then propose a novel method to test for the closeness of influence in general model-free supervised learning problems. Our proposed test can be used with finite number of samples with control on type I error rate, no matter the ground truth conditional law $\mathcal{L}(Y|X)$. We analyze the power of our test for two general learning problems i) linear regression, and ii) binary classification under mixture of Gaussian models, and show that under the proper choice of score function, an internal component of our test, with sufficient number of samples will achieve full statistical power. We evaluate our findings through extensive numerical simulations, specifically we adopt the datamodel framework (Ilyas, et al. , 2022) for CIFAR-10 dataset to identify pairs of training samples with different influence on the trained model via optional black box training mechanisms.

ICLR Conference 2023 Conference Paper

Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs

  • Sudhanshu Chanpuriya
  • Ryan A. Rossi
  • Sungchul Kim
  • Tong Yu 0001
  • Jane Hoffswell
  • Nedim Lipka
  • Shunan Guo
  • Cameron Musco

Temporal networks model a variety of important phenomena involving timed interactions between entities. Existing methods for machine learning on temporal networks generally exhibit at least one of two limitations. First, many methods assume time to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations can only be calculated indirectly from the nodes, which may be suboptimal for tasks like edge classification. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. From this derived graph, edge representations for the original network can be computed with efficient classical methods. The simplicity of this approach facilitates explicit theoretical analysis: we can constructively show the effectiveness of our method's representations for a natural synthetic model of temporal networks. Empirical results on real-world networks demonstrate our method's efficacy and efficiency on both link classification and prediction.

ICLR Conference 2023 Conference Paper

MetaGL: Evaluation-Free Selection of Graph Learning Models via Meta-Learning

  • Namyong Park 0001
  • Ryan A. Rossi
  • Nesreen K. Ahmed
  • Christos Faloutsos

Given a graph learning task, such as link prediction, on a new graph, how can we select the best method as well as its hyperparameters (collectively called a model) without having to train or evaluate any model on the new graph? Model selection for graph learning has been largely ad hoc. A typical approach has been to apply popular methods to new datasets, but this is often suboptimal. On the other hand, systematically comparing models on the new graph quickly becomes too costly, or even impractical. In this work, we develop the first meta-learning approach for evaluation-free graph learning model selection, called MetaGL, which utilizes the prior performances of existing methods on various benchmark graph datasets to automatically select an effective model for the new graph, without any model training or evaluations. To quantify similarities across a wide variety of graphs, we introduce specialized meta-graph features that capture the structural characteristics of a graph. Then we design G-M network, which represents the relations among graphs and models, and develop a graph-based meta-learner operating on this G-M network, which estimates the relevance of each model to different graphs. Extensive experiments show that using MetaGL to select a model for the new graph greatly outperforms several existing meta-learning techniques tailed for graph learning model selection (up to 47% better), while being extremely fast at test time (∼1 sec).

ICML Conference 2022 Conference Paper

One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes

  • Aravind Reddy
  • Ryan A. Rossi
  • Zhao Song 0002
  • Anup B. Rao
  • Tung Mai
  • Nedim Lipka
  • Gang Wu 0013
  • Eunyee Koh

In this paper, we initiate the study of one-pass algorithms for solving the maximum-a-posteriori (MAP) inference problem for Non-symmetric Determinantal Point Processes (NDPPs). In particular, we formulate streaming and online versions of the problem and provide one-pass algorithms for solving these problems. In our streaming setting, data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory, and only need to output a valid solution at the end of the stream. Our online setting has an additional requirement of maintaining a valid solution at any point in time. We design new one-pass algorithms for these problems and show that they perform comparably to (or even better than) the offline greedy algorithm while using substantially lower memory.

ICML Conference 2021 Conference Paper

Asymptotics of Ridge Regression in Convolutional Models

  • Mojtaba Sahraee-Ardakan
  • Tung Mai
  • Anup B. Rao
  • Ryan A. Rossi
  • Sundeep Rangan
  • Alyson K. Fletcher

Understanding generalization and estimation error of estimators for simple models such as linear and generalized linear models has attracted a lot of attention recently. This is in part due to an interesting observation made in machine learning community that highly over-parameterized neural networks achieve zero training error, and yet they are able to generalize well over the test samples. This phenomenon is captured by the so called double descent curve, where the generalization error starts decreasing again after the interpolation threshold. A series of recent works tried to explain such phenomenon for simple models. In this work, we analyze the asymptotics of estimation error in ridge estimators for convolutional linear models. These convolutional inverse problems, also known as deconvolution, naturally arise in different fields such as seismology, imaging, and acoustics among others. Our results hold for a large class of input distributions that include i. i. d. features as a special case. We derive exact formulae for estimation error of ridge estimators that hold in a certain high-dimensional regime. We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments.

ICML Conference 2021 Conference Paper

Fundamental Tradeoffs in Distributionally Adversarial Training

  • Mohammad Mehrabi
  • Adel Javanmard
  • Ryan A. Rossi
  • Anup B. Rao
  • Tung Mai

Adversarial training is among the most effective techniques to improve robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary’s manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary’s power or the width of two-layer neural network would affect this tradeoff.

AAAI Conference 2021 Conference Paper

Graph Neural Networks with Heterophily

  • Jiong Zhu
  • Ryan A. Rossi
  • Anup Rao
  • Tung Mai
  • Nedim Lipka
  • Nesreen K. Ahmed
  • Danai Koutra

Graph Neural Networks (GNNs) have proven to be useful for many different practical applications. However, many existing GNN models have implicitly assumed homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily, where most connected nodes are from different classes. In this work, we propose a novel framework called CPGNN that generalizes GNNs for graphs with either homophily or heterophily. The proposed framework incorporates an interpretable compatibility matrix for modeling the heterophily or homophily level in the graph, which can be learned in an end-to-end fashion, enabling it to go beyond the assumption of strong homophily. Theoretically, we show that replacing the compatibility matrix in our framework with the identity (which represents pure homophily) reduces to GCN. Our extensive experiments demonstrate the effectiveness of our approach in more realistic and challenging experimental settings with significantly less training data compared to previous works: CPGNN variants achieve state-of-the-art results in heterophily settings with or without contextual node features, while maintaining comparable performance in homophily settings.

ICLR Conference 2021 Conference Paper

Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation

  • Mrigank Raman
  • Aaron Chan
  • Siddhant Agarwal
  • Peifeng Wang
  • Hansen Wang
  • Sungchul Kim
  • Ryan A. Rossi
  • Handong Zhao

Knowledge graphs (KGs) have helped neural models improve performance on various knowledge-intensive tasks, like question answering and item recommendation. By using attention over the KG, such KG-augmented models can also "explain" which KG information was most relevant for making a given prediction. In this paper, we question whether these models are really behaving as we expect. We show that, through a reinforcement learning policy (or even simple heuristics), one can produce deceptively perturbed KGs, which maintain the downstream performance of the original KG while significantly deviating from the original KG's semantics and structure. Our findings raise doubts about KG-augmented models' ability to reason about KG information and give sensible explanations.

SODA Conference 2020 Conference Paper

Approximate Maximum Matching in Random Streams

  • Alireza Farhadi 0001
  • MohammadTaghi Hajiaghayi
  • Tung Mai
  • Anup B. Rao
  • Ryan A. Rossi

In this paper, we study the problem of finding a maximum matching in the semi-streaming model when edges arrive in a random order. In the semi-streaming model, an algorithm receives a stream of edges and it is allowed to have a memory of Õ ( n ) 1 where n is the number of vertices in the graph. A recent inspiring work by Assadi et al. [1] shows that there exists a streaming algorithm with the approximation ratio of ⅔ that uses Õ ( n 1. 5 ) memory. However, the memory of their algorithm is much larger than the memory constraint of the semi-streaming algorithms. In this work, we further investigate this problem in the semi-streaming model, and we present simple and clean algorithms for approximating maximum matching in the semi-streaming model. Our main results are as follows. We show that there exists a single-pass deterministic semi-streaming algorithm that finds a approximation of the maximum matching in bipartite graphs using Õ ( n ) memory. This result significantly outperforms the state-of-the-art result of Konrad [12] that finds a 0. 539 approximation of the maximum matching using Õ ( n ) memory. By giving a black-box reduction from finding a matching in general graphs to finding a matching in bipartite graphs, we show there exists a single-pass deterministic semi-streaming algorithm that finds a (≈ 0. 545) approximation of the maximum matching in general graphs, improving upon the state-of-art result 0. 506 approximation by Gamlath et al. [8].

JMLR Journal 2020 Journal Article

Ensemble Learning for Relational Data

  • Hoda Eldardiry
  • Jennifer Neville
  • Ryan A. Rossi

We present a theoretical analysis framework for relational ensemble models. We show that ensembles of collective classifiers can improve predictions for graph data by reducing errors due to variance in both learning and inference. In addition, we propose a relational ensemble framework that combines a relational ensemble learning approach with a relational ensemble inference approach for collective classification. The proposed ensemble techniques are applicable for both single and multiple graph settings. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed framework. Finally, our experimental results support the theoretical analysis and confirm that ensemble algorithms that explicitly focus on both learning and inference processes and aim at reducing errors associated with both, are the best performers. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

ICML Conference 2020 Conference Paper

Structured Policy Iteration for Linear Quadratic Regulator

  • Youngsuk Park
  • Ryan A. Rossi
  • Zheng Wen
  • Gang Wu 0013
  • Handong Zhao

Linear quadratic regulator (LQR) is one of the most popular frameworks to tackle continuous Markov decision process tasks. With its fundamental theory and tractable optimal policy, LQR has been revisited and analyzed in recent years, in terms of reinforcement learning scenarios such as the model-free or model-based setting. In this paper, we introduce the Structured Policy Iteration (S-PI) for LQR, a method capable of deriving a structured linear policy. Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy: more interpretable, memory-efficient, and well-suited for the distributed setting. In order to derive such a policy, we first cast a regularized LQR problem when the model is known. Then, our Structured Policy Iteration (S-PI) algorithm, which takes a policy evaluation step and a policy improvement step in an iterative manner, can solve this regularized LQR efficiently. We further extend the S-PI algorithm to the model-free setting where a smoothing procedure is adopted to estimate the gradient. In both the known-model and model-free setting, we prove convergence analysis under the proper choice of parameters. Finally, the experiments demonstrate the advantages of S-PI in terms of balancing the LQR performance and level of structure by varying the weight parameter.

UAI Conference 2019 Conference Paper

On Densification for Minwise Hashing

  • Tung Mai
  • Anup B. Rao
  • Matt Kapilevich
  • Ryan A. Rossi
  • Yasin Abbasi-Yadkori
  • Ritwik Sinha

One Permutation Hashing (OPH) is a significantly more efficient alternative to the popular minwise hashing. To produce a sketch of size $k$, OPH requires just one hash function whereas the classical minwise hashing requires $k$ hash functions. However, OPH does not have the desirable locality sensitive hashing (LSH) property that is important for indexing. [Srivastava and Li 2014, ICML] proposed the novel idea of densification to produce LSH sketches from OPH sketches, and gave the first densification routine. In this paper, we give a necessary and sufficient condition for a densification routine to result in LSH sketches when applied to OPH sketches. Furthermore, we give a novel densification routine that for every input, takes O($k \log k$) time in expectation and achieves better variance than the previous best bound obtained by [Srivastava 2017]. The running time of the densification routine given in [Srivastava 2017] for worst case inputs is O($k^2$) in expectation.

TIST Journal 2018 Journal Article

Interactive Visual Graph Mining and Learning

  • Ryan A. Rossi
  • Nesreen K. Ahmed
  • Rong Zhou
  • Hoda Eldardiry

This article presents a platform for interactive graph mining and relational machine learning called GraphVis. The platform combines interactive visual representations with state-of-the-art graph mining and relational machine learning techniques to aid in revealing important insights quickly as well as learning an appropriate and highly predictive model for a particular task (e.g., classification, link prediction, discovering the roles of nodes, and finding influential nodes). Visual representations and interaction techniques and tools are developed for simple, fast, and intuitive real-time interactive exploration, mining, and modeling of graph data. In particular, we propose techniques for interactive relational learning (e.g., node/link classification), interactive link prediction and weighting, role discovery and community detection, higher-order network analysis (via graphlets, network motifs), among others. GraphVis also allows for the refinement and tuning of graph mining and relational learning methods for specific application domains and constraints via an end-to-end interactive visual analytic pipeline that learns, infers, and provides rapid interactive visualization with immediate feedback at each change/prediction in real-time. Other key aspects include interactive filtering, querying, ranking, manipulating, exporting, as well as tools for dynamic network analysis and visualization, interactive graph generators (including new block model approaches), and a variety of multi-level network analysis techniques.

KER Journal 2018 Journal Article

Relational time series forecasting

  • Ryan A. Rossi

Abstract Networks encode dependencies between entities (people, computers, proteins) and allow us to study phenomena across social, technological, and biological domains. These networks naturally evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Despite the importance of modeling these dynamics, existing work in relational machine learning has ignored relational time series data. Relational time series learning lies at the intersection of traditional time series analysis and statistical relational learning, and bridges the gap between these two fundamentally important problems. This paper formulates the relational time series learning problem, and a general framework and taxonomy for representation discovery tasks of both nodes and links including predicting their existence, label, and weight (importance), as well as systematically constructing features. We also reinterpret the prediction task leading to the proposal of two important relational time series forecasting tasks consisting of (i) relational time series classification (predicts a future class or label of an entity), and (ii) relational time series regression (predicts a future real-valued attribute or weight). Relational time series models are designed to leverage both relational and temporal dependencies to minimize forecasting error for both relational time series classification and regression. Finally, we discuss challenges and open problems that remain to be addressed.