Author name cluster

Qi Yan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

TMLR Journal 2025 Journal Article

Joint Generative Modeling of Grounded Scene Graphs and Images via Diffusion Models

Bicheng Xu
Qi Yan
Renjie Liao
Lele Wang
Leonid Sigal

A grounded scene graph represents a visual scene as a graph, where nodes denote objects (including labels and spatial locations) and directed edges encode relations among them. In this paper, we introduce a novel framework for joint grounded scene graph - image generation, a challenging task involving high-dimensional, multi-modal structured data. To effectively model this complex joint distribution, we adopt a factorized approach: first generating a grounded scene graph, followed by image generation conditioned on the generated grounded scene graph. While conditional image generation has been widely explored in the literature, our primary focus is on the generation of grounded scene graphs from noise, which provides efficient and interpretable control over the image generation process. This task requires generating plausible grounded scene graphs with heterogeneous attributes for both nodes (objects) and edges (relations among objects), encompassing continuous attributes (e.g., object bounding boxes) and discrete attributes (e.g., object and relation categories). To address this challenge, we introduce DiffuseSG, a novel diffusion model that jointly models the heterogeneous node and edge attributes. We explore different encoding strategies to effectively handle the categorical data. Leveraging a graph transformer as the denoiser, DiffuseSG progressively refines grounded scene graph representations in a continuous space before discretizing them to generate structured outputs. Additionally, we introduce an IoU-based regularization term to enhance empirical performance. Our model outperforms existing methods in grounded scene graph generation on the Visual Genome and COCO-Stuff datasets, excelling in both standard and newly introduced metrics that more accurately capture the task’s complexity. Furthermore, we demonstrate the broader applicability of DiffuseSG in two important downstream tasks: (1) achieving superior results in a range of grounded scene graph completion tasks, and (2) enhancing grounded scene graph detection models by leveraging additional training samples generated by DiffuseSG. Code is available at https://github.com/ubc-vision/DiffuseSG.

PDF Details

NeurIPS Conference 2025 Conference Paper

Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction

Yuanpei Gao
Qi Yan
Yan Leng
Renjie Liao

While deep learning methods have achieved strong performance in time series prediction, their black-box nature and inability to explicitly model underlying stochastic processes often limit their robustness handling non-stationary data, especially in the presence of abrupt changes. In this work, we introduce Neural MJD, a neural network based non-stationary Merton jump diffusion (MJD) model. Our model explicitly formulates forecasting as a stochastic differential equation (SDE) simulation problem, combining a time-inhomogeneous Itô diffusion to capture non-stationary stochastic dynamics with a time-inhomogeneous compound Poisson process to model abrupt jumps. To enable tractable learning, we introduce a likelihood truncation mechanism that caps the number of jumps within small time intervals and provide a theoretical error bound for this approximation. Additionally, we propose an Euler-Maruyama with restart solver, which achieves a provably lower error bound in estimating expected states and reduced variance compared to the standard solver. Experiments on both synthetic and real-world datasets demonstrate that Neural MJD consistently outperforms state-of-the-art deep learning and statistical learning methods. Our code is available at https: //github. com/DSL-Lab/neural-MJD.

PDF Details

NeurIPS Conference 2025 Conference Paper

RETRO SYNFLOW: Discrete Flow-Matching for Accurate and Diverse Single-Step Retrosynthesis

Robin Yadav
Qi Yan
Guy Wolf
Joey Bose
Renjie Liao

A fundamental challenge in organic chemistry is identifying and predicting the sequence of reactions that synthesize a desired target molecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction—i. e. , single-step retrosynthesis—remains difficult, even for state-of-the-art template-free generative methods. These models often struggle to produce an accurate yet diverse set of feasible reactions in a chemically rational manner. In this paper, we propose RETRO SYNFLOW (RSF), a discrete flow-matching framework that formulates single-step retrosynthesis as a Markov bridge between a given product molecule and its corresponding reactants. Unlike prior approaches, RSF introduces a reaction center identification step to extract intermediate structures, or synthons, which serve as a more informative and structured source distribution for the discrete flow model. To further improve the diversity and chemical feasibility of generated samples, RSF incorporates Feynman-Kac (FK) steering with Sequential Monte Carlo (SMC) resampling at inference time. This approach leverages a learned forward-synthesis reward oracle to guide the generation process toward more promising reactant candidates. Empirically, RSF substantially outperforms the previous state-of-the-art methods in top-1 accuracy. In addition, FK-steering significantly improves round-trip accuracy, demonstrating stronger chemical validity and synthetic feasibility, all while maintaining competitive top-k performance. These results establish RSF as a new leading approach for single-step retrosynthesis prediction.

PDF Details

IROS Conference 2025 Conference Paper

TrajFlow: Multi-modal Motion Prediction via Flow Matching

Qi Yan
Brian Zhang
Yutong Zhang
Daniel Yang
Joshua White
Di Chen
Jiachao Liu
Langechuan Liu

Efficient and accurate motion prediction is crucial for ensuring safety and informed decision-making in autonomous driving, particularly under dynamic real-world conditions that necessitate multi-modal forecasts. We introduce TrajFlow, a novel flow matching-based motion prediction framework that addresses the scalability and efficiency challenges of existing generative trajectory prediction methods. Unlike conventional generative approaches that employ i. i. d. sampling and require multiple inference passes to capture diverse outcomes, TrajFlow predicts multiple plausible future trajectories in a single pass, significantly reducing computational overhead while maintaining coherence across predictions. Moreover, we propose a ranking loss based on the Plackett-Luce distribution to improve uncertainty estimation of predicted trajectories. Additionally, we design a self-conditioning training technique that reuses the model’s own predictions to construct noisy inputs during a second forward pass, thereby improving generalization and accelerating inference. Extensive experiments on the large-scale Waymo Open Motion Dataset (WOMD) demonstrate that TrajFlow achieves state-of-the-art performance across various key metrics, underscoring its effectiveness for safety-critical autonomous driving applications. The code and other details are available on the project website https://traj-flow.github.io/.

Details

IJCAI Conference 2025 Conference Paper

Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction

Zhi Sheng
Daisy Yuan
Jingtao Ding
Qi Yan
Xi Zheng
Yue Sun
Yong Li

Accurate prediction of mobile traffic, i. e. , network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due to their ability to capture the inherent uncertainties. Most existing approaches prioritize designing novel denoising networks but often neglect the critical role of noise itself, potentially leading to sub-optimal performance. In this paper, we introduce a novel perspective by emphasizing the role of noise in the denoising process. Our analysis reveals that noise fundamentally shapes mobile traffic predictions, exhibiting distinct and consistent patterns. We propose NPDiff, a framework that decomposes noise into prior and residual components, with the prior derived from data dynamics, enhancing the model's ability to capture both regular and abrupt variations. NPDiff can seamlessly integrate with various diffusion-based prediction models, delivering predictions that are effective, efficient, and robust. Extensive experiments demonstrate that it achieves superior performance with an improvement over 30%, offering a new perspective on leveraging diffusion models in this domain. We provide code and data at https: //github. com/tsinghua-fib-lab/NPDiff.

PDF Details DOI

ICLR Conference 2024 Conference Paper

AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval

Qi Yan
Raihan Seraj
Jiawei He
Lili Meng
Tristan Sylvain

Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answering forecasting queries. Nevertheless, existing methods still trail behind human performance. The cornerstone of accurate forecasting, we argue, lies in identifying a concise, yet rich subset of news snippets from a vast corpus. With this motivation, we introduce AutoCast++, a zero-shot ranking-based context retrieval system, tailored to sift through expansive news document collections for event forecasting. Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news. Following this, the chosen articles are subjected to zero-shot summarization to attain succinct context. Leveraging a pre-trained language model, we conduct both the relevance evaluation and article summarization without needing domain-specific training. Notably, recent articles can sometimes be at odds with preceding ones due to new facts or unanticipated incidents, leading to fluctuating temporal dynamics. To tackle this, our re-ranking mechanism gives preference to more recent articles, and we further regularize the multi-passage representation learning to align with human forecaster responses made on different dates. Empirical results underscore marked improvements across multiple metrics, improving the performance for multiple-choice questions (MCQ) by 48% and true/false (TF) questions by up to 8%. Code is available at https://github.com/BorealisAI/Autocast-plus-plus.

Details

TMLR Journal 2024 Journal Article

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Qi Yan
Zhengyang Liang
Yang Song
Renjie Liao
Lele Wang

Permutation-invariant diffusion models of graphs achieve the invariant sampling and invariant loss functions by restricting architecture designs, which often sacrifice empirical performances. In this work, we first show that the performance degradation may also be contributed by the increasing modes of target distributions brought by invariant architectures since 1) the optimal one-step denoising scores are score functions of Gaussian mixtures models (GMMs) whose components center on these modes and 2) learning the scores of GMMs with more components is often harder. Motivated by the analysis, we propose SwinGNN along with a simple yet provable trick that enables permutation-invariant sampling. It benefits from more flexible (non-invariant) architecture designs and permutation-invariant sampling. We further design an efficient 2-WL message passing network using the shifted-window self-attention. Extensive experiments on synthetic and real-world protein and molecule datasets show that SwinGNN outperforms existing methods by a substantial margin on most metrics. Our code is released at https://github.com/qiyan98/SwinGNN.

PDF Details

TMLR Journal 2024 Journal Article

Video Diffusion Models: A Survey

Andrew Melnik
Michal Ljubljanac
Cong Lu
Qi Yan
Weiming Ren
Helge Ritter

Diffusion generative models have recently become a powerful technique for creating and modifying high-quality, coherent video content. This survey provides a comprehensive overview of the critical components of diffusion models for video generation, including their applications, architectural design, and temporal dynamics modeling. The paper begins by discussing the core principles and mathematical formulations, then explores various architectural choices and methods for maintaining temporal consistency. A taxonomy of applications is presented, categorizing models based on input modalities such as text prompts, images, videos, and audio signals. Advancements in text-to-video generation are discussed to illustrate the state-of-the-art capabilities and limitations of current approaches. Additionally, the survey summarizes recent developments in training and evaluation practices, including the use of diverse video and image datasets and the adoption of various evaluation metrics to assess model performance. The survey concludes with an examination of ongoing challenges, such as generating longer videos and managing computational costs, and offers insights into potential future directions for the field. By consolidating the latest research and developments, this survey aims to serve as a valuable resource for researchers and practitioners working with video diffusion models. Website: \url{https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models}

PDF Details

JMLR Journal 2015 Journal Article

Simultaneous Pursuit of Sparseness and Rank Structures for Matrix Decomposition

Qi Yan
Jieping Ye
Xiaotong Shen

In multi-response regression, pursuit of two different types of structures is essential to battle the curse of dimensionality. In this paper, we seek a sparsest decomposition representation of a parameter matrix in terms of a sum of sparse and low rank matrices, among many overcomplete decompositions. On this basis, we propose a constrained method subject to two nonconvex constraints, respectively for sparseness and low-rank properties. Computationally, obtaining an exact global optimizer is rather challenging. To overcome the difficulty, we use an alternating directions method solving a low-rank subproblem and a sparseness subproblem alternatively, where we derive an exact solution to the low-rank subproblem, as well as an exact solution in a special case and an approximated solution generally through a surrogate of the $L_0$-constraint and difference convex programming, for the sparse subproblem. Theoretically, we establish convergence rates of a global minimizer in the Hellinger-distance, providing an insight into why pursuit of two different types of decomposed structures is expected to deliver higher estimation accuracy than its counterparts based on either sparseness alone or low-rank approximation alone. Numerical examples are given to illustrate these aspects, in addition to an application to facial imagine recognition and multiple time series analysis. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

PDF Details