Author name cluster

Yanjun Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

AAAI Conference 2026 Conference Paper

Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models

Xinzhe Zheng
Shiyu Jiang
Gustavo Seabra
Chenglong Li
Yanjun Li

Deep generative models are rapidly advancing structure-based drug design, offering substantial promise for generating small molecule ligands that bind to specific protein targets. However, most current approaches assume a rigid protein binding pocket, neglecting the intrinsic flexibility of proteins and the conformational rearrangements induced by ligand binding, limiting their applicability in practical drug discovery. Here, we propose Apo2Mol, a diffusion-based generative framework for 3D molecule design that explicitly accounts for conformational flexibility in protein binding pockets. To support this, we curate a dataset of over 24,000 experimentally resolved apo-holo structure pairs from the Protein Data Bank, enabling the characterization of protein structure changes associated with ligand binding. Apo2Mol employs a full-atom hierarchical graph-based diffusion model that simultaneously generates 3D ligand molecules and their corresponding holo pocket conformations from input apo states. Empirical studies demonstrate that Apo2Mol can achieve state-of-the-art performance in generating high-affinity ligands and accurately capture realistic protein pocket conformational changes.

PDF Details DOI

AAAI Conference 2026 Conference Paper

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering

Yanjun Li
Yuqian Fu
Tianwen Qian
Qi'Ao Xu
Silong Dai
Danda Pani Paudel
Luc Van Gool
Xiaoling Wang

Recent advances in Multimodal Large Language Models (MLLMs) have significantly pushed the frontier of egocentric video question answering (EgocentricQA). However, existing benchmarks and studies are mainly limited to common daily activities such as cooking and cleaning. In contrast, real-world deployment inevitably encounters domain shifts, where target domains differ substantially in both visual style and semantic content. To bridge this gap, we introduce EgoCross, a comprehensive benchmark designed to evaluate the cross-domain generalization of MLLMs in EgocentricQA. EgoCross covers four diverse and challenging domains, including surgery, industry, extreme sports, and animal perspective, representing realistic and high-impact application scenarios. It comprises approximately 1,000 QA pairs across 798 video clips, spanning four key QA tasks: prediction, recognition, localization, and counting. Each QA pair provides both OpenQA and CloseQA formats to support fine-grained evaluation. Extensive experiments show that most existing MLLMs, whether general-purpose or egocentric-specialized, struggle to generalize to domains beyond daily life, highlighting the limitations of current models. Furthermore, we conduct several pilot studies, e.g., fine-tuning and reinforcement learning, to explore potential improvements. We hope EgoCross and our accompanying analysis will serve as a foundation for advancing domain-adaptive, robust egocentric video understanding.

PDF Details DOI

AAAI Conference 2026 Conference Paper

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI

Tianbin Li
Yanzhou Su
Wei Li
Bin Fu
Zhe Chen
Ziyan Huang
Guoan Wang
Chenglong Ma

Despite significant advancements in general AI, its effectiveness in the medical domain is limited by the lack of specialized medical knowledge. To address this, we formulate GMAI-VL-5.5M, a multimodal medical dataset created by converting hundreds of specialized medical datasets with various annotations into high-quality image-text pairs. This dataset offers comprehensive task coverage, diverse modalities, and rich image-text data. Building upon this dataset, we develop GMAI-VL, a 7B-parameter general medical vision-language model, with a three-stage training strategy that enhances the integration of visual and textual information. This approach significantly improves the model's ability to process multimodal data, supporting accurate diagnoses and clinical decision-making. Experiments show that GMAI-VL achieves state-of-the-art performance across various multimodal medical tasks, including visual question answering and medical image diagnosis.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction

Yupu Zhang
Zelin Xu
Tingsong Xiao
Gustavo Seabra
Yanjun Li
Chenglong Li
Zhe Jiang

Predicting the binding affinity of protein-ligand complexes plays a vital role in drug discovery. Unfortunately, progress has been hindered by the lack of large-scale and high-quality binding affinity labels. The widely used PDBbind dataset has fewer than 20K labeled complexes. Self-supervised learning, especially graph contrastive learning (GCL), provides a unique opportunity to break the barrier by pretraining graph neural network models based on vast unlabeled complexes and fine-tuning the models on much fewer labeled complexes. However, the problem faces unique challenges, including a lack of a comprehensive unlabeled dataset with well-defined positive/negative complex pairs and the need to design GCL algorithms that incorporate the unique characteristics of such data. To fill the gap, we propose DecoyDB, a large-scale, structure-aware dataset specifically designed for self-supervised GCL on protein–ligand complexes. DecoyDB consists of high-resolution ground truth complexes and diverse decoy structures with computationally generated binding poses that range from realistic to suboptimal. Each decoy is annotated with a Root Mean Square Deviation (RMSD) from the native pose. We further design a customized GCL framework to pretrain graph neural networks based on DecoyDB and fine-tune the models with labels from PDBbind. Extensive experiments confirm that models pretrained with DecoyDB achieve superior accuracy, sample efficiency, and generalizability.

PDF Details

LORI Conference 2025 Conference Paper

Interval Temporal Logic HS with Path Quantifiers

Yanjun Li
Shuyuan Li

Abstract To enhance the expressiveness of the interval temporal logic HS over branching interval models, this paper extends the language of HS by introducing quantifiers over paths. We also modify the original semantics of HS by evaluating formulas on pairs of intervals and paths, rather than on intervals alone. Our extension of HS aligns with the extension of CTL to ${\texttt {CTL}^*}$. The resulting logic is called $\texttt {HS}^{*}$ in this paper. We show that $\texttt {HS}^{*}$ is strictly more expressive than HS, and we also demonstrate that $\texttt {HS}^{*}$ is a fragment of Monadic Path Logic, a monadic second-order logic where set quantification is restricted to paths.

Details

EAAI Journal 2025 Journal Article

One-shot multi-focus image stack fusion via focal depth regression

Xinzhe Xie
Buyu Guo
Shuangyan He
Yanzhen Gu
Yanjun Li
Peiliang Li

Details DOI

NeurIPS Conference 2024 Conference Paper

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Pengcheng Chen
Jin Ye
Guoan Wang
Yanjun Li
Zhongying Deng
Wei Li
Tianbin Li
Haodong Duan

Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Current benchmarks are often built upon specific academic literature, mainly focusing on a single domain, and lacking varying perceptual granularities. Thus, they face specific challenges, including limited clinical relevance, incomplete evaluations, and insufficient guidance for interactive LVLMs. To address these limitations, we developed the GMAI-MMBench, the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format. Additionally, we implemented a lexical tree structure that allows users to customize evaluation tasks, accommodating various assessment needs and substantially supporting medical AI research and applications. We evaluated 50 LVLMs, and the results show that even the advanced GPT-4o only achieves an accuracy of 53. 96\%, indicating significant room for improvement. Moreover, we identified five key insufficiencies in current cutting-edge LVLMs that need to be addressed to advance the development of better medical applications. We believe that GMAI-MMBench will stimulate the community to build the next generation of LVLMs toward GMAI.

PDF Details DOI

AIJ Journal 2024 Journal Article

Knowing how to plan about planning: Higher-order and meta-level epistemic planning

Yanjun Li
Yanjing Wang

Details DOI

ICRA Conference 2023 Conference Paper

A Bioinspired Synthetic Nervous System Controller for Pick-and-Place Manipulation

Yanjun Li
Ravesh Sukhnandan
Jeffrey P. Gill
Hillel J. Chiel
Victoria A. Webster-Wood
Roger D. Quinn

The Synthetic Nervous System (SNS) is a biologically inspired neural network (NN). Due to its capability of capturing complex mechanisms underlying neural computation, an SNS model is a candidate for building compact and interpretable NN controllers for robots. Previous work on SNSs has focused on applying the model to the control of legged robots and the design of functional subnetworks (FSNs) to realize dynamical systems. However, the FSN approach has previously relied on the analytical solution of the governing equations, which is difficult for designing more complex NN controllers. Incorporating plasticity into SNSs and using learning algorithms to tune the parameters offers a promising solution for systematic design in this situation. In this paper, we theoretically analyze the computational advantages of SNSs compared with other classical artificial neural networks. We then use learning algorithms to develop compact subnetworks for implementing addition, subtraction, division, and multiplication. We also combine the learning-based methodology with a bioinspired architecture to design an interpretable SNS for the pick-and-place control of a simulated gantry system. Finally, we show that the SNS controller is successfully transferred to a real-world robotic platform without further tuning of the parameters, verifying the effectiveness of our approach.

Details

LORI Conference 2023 Conference Paper

A Temporal Logic for Successive Events

Yanjun Li
Jiajie Zhao

Abstract A succession of events is a sequence of events such that after one event is finished, the next one occurs successively. In this paper, we extended linear temporal logic with a new modality to capture the case that a sequence of events successively occurs. We compared the expressivity between this extended linear temporal logic and the standard linear temporal logic.

Details

TIST Journal 2023 Journal Article

Fast Real-Time Video Object Segmentation with a Tangled Memory Network

Jianbiao Mei
Mengmeng Wang
Yu Yang
Yanjun Li
Yong Liu

In this article, we present a fast real-time tangled memory network that segments the objects effectively and efficiently for semi-supervised video object segmentation (VOS). We propose a tangled reference encoder and a memory bank organization mechanism based on a state estimator to fully utilize the mask features and alleviate memory overhead and computational burden brought by the unlimited memory bank used in many memory-based methods. First, the tangled memory network exploits the mask features that uncover abundant object information like edges and contours but are not fully explored in existing methods. Specifically, a tangled two-stream reference encoder is designed to extract and fuse the features from both RGB frames and the predicted masks. Second, to indicate the quality of the predicted mask and feedback the online prediction state for organizing the memory bank, we devise a target state estimator to learn the IoU score between the predicted mask and ground truth. Moreover, to accelerate the forward process and avoid memory overflow, we use a memory bank of fixed size to store historical features by designing a new efficient memory bank organization mechanism based on the mask state score provided by the state estimator. We conduct comprehensive experiments on the public benchmarks DAVIS and YouTube-VOS, demonstrating that our method obtains competitive results while running at high speed (66 FPS on the DAVIS16-val set).

Details DOI

TARK Conference 2023 Conference Paper

Tableaux for the Logic of Strategically Knowing How

Yanjun Li

The logic of goal-directed knowing-how extends the standard epistemic logic with an operator of knowing-how. The knowing-how operator is interpreted as that there exists a strategy such that the agent knows that the strategy can make sure that p. This paper presents a tableau procedure for the multi-agent version of the logic of strategically knowing-how and shows the soundness and completeness of this tableau procedure. This paper also shows that the satisfiability problem of the logic can be decided in PSPACE.

Details DOI

TARK Conference 2021 Conference Paper

Knowing How to Plan

Yanjun Li
Yanjing Wang 0001

Various planning-based know-how logics have been studied in the recent literature. In this paper, we use such a logic to do know-how-based planning via model checking. In particular, we can handle the higher-order epistemic planning involving know-how formulas as the goal, e. g. , find a plan to make sure p such that the adversary does not know how to make p false in the future. We give a PTIME algorithm for the model checking problem over finite epistemic transition systems and axiomatize the logic under the assumption of perfect recall.

Details DOI

LORI Conference 2021 Conference Paper

Multi-agent Conformant Planning with Distributed Knowledge

Yanjun Li

Abstract In this paper, we study the evolution of knowledge in multi-agent conformant planning over transition systems. We propose a dynamic epistemic logical framework with modalities of distributed knowledge to handle the epistemic reasoning in such scenarios, and we reduce a problem of multi-agent conformant planning to a model checking problem. We prove that multi-agent conformant planning is Pspace -complete on the size of the dynamic epistemic model.

Details

AIJ Journal 2021 Journal Article

Planning-based knowing how: A unified approach

Yanjun Li
Yanjing Wang

Details DOI

AIJ Journal 2019 Journal Article

A dynamic epistemic framework for reasoning about conformant probabilistic plans

Yanjun Li
Barteld Kooi
Yanjing Wang

Details DOI

AAAI Conference 2019 Conference Paper

Learning Adaptive Random Features

Yanjun Li
Kai Zhang
Jun Wang
Sanjiv Kumar

Random Fourier features are a powerful framework to approximate shift invariant kernels with Monte Carlo integration, which has drawn considerable interest in scaling up kernel-based learning, dimensionality reduction, and information retrieval. In the literature, many sampling schemes have been proposed to improve the approximation performance. However, an interesting theoretic and algorithmic challenge still remains, i. e. , how to optimize the design of random Fourier features to achieve good kernel approximation on any input data using a low spectral sampling rate? In this paper, we propose to compute more adaptive random Fourier features with optimized spectral samples (wj’s) and feature weights (pj’s). The learning scheme not only significantly reduces the spectral sampling rate needed for accurate kernel approximation, but also allows joint optimization with any supervised learning framework. We establish generalization bounds using Rademacher complexity, and demonstrate advantages over previous methods. Moreover, our experiments show that the empirical kernel approximation provides effective regularization for supervised learning.

PDF Details

LORI Conference 2019 Conference Paper

Multi-agent Knowing How via Multi-step Plans: A Dynamic Epistemic Planning Based Approach

Yanjun Li
Yanjing Wang 0001

Abstract There are currently two approaches to the logic of knowing how: the planning-based one and the coalition-based one. However, the first is single-agent, and the second is based on single-step joint actions. In this paper, to overcome both limitations, we propose a multi-agent framework for the logic of knowing how, based on multi-step dynamic epistemic planning studied in the literature. We obtain a sound and complete axiomatization and show that the logic is decidable, although the corresponding multi-agent epistemic planning problem is undecidable.

Details

NeurIPS Conference 2018 Conference Paper

Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere

Yanjun Li
Yoram Bresler

Multichannel blind deconvolution is the problem of recovering an unknown signal $f$ and multiple unknown channels $x_i$ from convolutional measurements $y_i=x_i \circledast f$ ($i=1, 2, \dots, N$). We consider the case where the $x_i$'s are sparse, and convolution with $f$ is invertible. Our nonconvex optimization formulation solves for a filter $h$ on the unit sphere that produces sparse output $y_i\circledast h$. Under some technical assumptions, we show that all local minima of the objective function correspond to the inverse filter of $f$ up to an inherent sign and shift ambiguity, and all saddle points have strictly negative curvatures. This geometric structure allows successful recovery of $f$ and $x_i$ using a simple manifold gradient descent algorithm with random initialization. Our theoretical findings are complemented by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods.

PDF Details

IJCAI Conference 2017 Conference Paper

Strategically knowing how

Raul Fervari
Andreas Herzig
Yanjun Li
Yanjing Wang

In this paper, we propose a single-agent logic of goal-directed knowing how extending the standard epistemic logic of knowing that with a new knowing how operator. The semantics of the new operator is based on the idea that knowing how to achieve phi means that there exists a (uniform) strategy such that the agent knows that it can make sure phi. We give an intuitive axiomatisation of our logic and prove the soundness, completeness, and decidability of the logic. The crucial axioms relating knowing that and knowing how illustrate our understanding of knowing how in this setting. This logic can be used in representing and reasoning about knowledge-how.

PDF Details

TARK Conference 2015 Conference Paper

A Dynamic Epistemic Framework for Conformant Planning

Quan Yu
Yanjun Li
Yanjing Wang 0001

In this paper, we introduce a lightweight dynamic epistemic logical framework for automated planning under initial uncertainty. We reduce plan verification and conformant planning to model checking problems of our logic. We show that the model checking problem of the iteration-free fragment is PSPACE-complete. By using two non-standard (but equivalent) semantics, we give novel model checking algorithms to the full language and the iteration-free language.

Details DOI

LORI Conference 2015 Conference Paper

Tableaux for Single-Agent Epistemic PDL with Perfect Recall and No Miracles

Yanjun Li

Abstract Epistemic propositional dynamic logic (EPDL) is a combination of epistemic logic and propositional dynamic logic. The properties, perfect recall and no miracles, capture the interactions between actions and knowledge. In this paper, we present a tableau-based decision procedure for deciding satisfiability of single-agent EPDL with perfect recall and no miracles. We prove the soundness and completeness of the tableau procedure with respect to models with perfect recall and no miracles.

Details