Author name cluster

Bo Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

226 papers

2 author rows

EAAI Journal 2026 Journal Article

A logistic matrix factorization recommendation algorithm based on polynomial coefficient perturbation

Zhiqiang Zhang
Bo Li
Jiangzhou Deng
Yong Wang
Jianmei Ye
Zhuo Liu

Details DOI

EAAI Journal 2026 Journal Article

A two-stage retired batteries screening solution through dynamic characteristic imaging processing

Maolin Yang
Yishun Liu
Bo Li
Benedict Jun Ma
Keke Huang
Wenfeng Deng
Chunhua Yang

Details DOI

AAAI Conference 2026 Conference Paper

Bidirectional Noise Injection: Enhancing Diffusion Models via Coordinated Input-Output Perturbation

Tianyi Zheng
Jiayang Gao
Peng-tao Jiang
Fengxiang Yang
Ben Wan
Hao Zhang
Jinwei Chen
Jia Wang

Diffusion models have demonstrated remarkable success in image generation, yet a persistent challenge remains: the bias between model predictions and the target distribution. In this paper, we propose a Bidirectional Noise Injection framework for enhancing diffusion models, implemented via Coordinated Input-Output Perturbation (CIOP). Our approach mitigates this bias by randomly applying synchronized noise injection to both the model inputs and the prediction targets during the training stage. This stochastic, synchronized noise injected acts as a smoothing mechanism that effectively reduces the 2-Wasserstein distance between the predicted and target distributions, as substantiated by our theoretical analysis based on optimal transport theory. Extensive experiments on multiple benchmark datasets and various generative tasks demonstrate that our method improves generation quality and training efficiency without incurring additional computational cost. Furthermore, the design of CIOP enables seamless integration with existing diffusion model improvements and advanced frameworks, thereby broadening its applicability. These results highlight the potential of Bidirectional Noise Injection via CIOP to alleviate bias in diffusion-based generative models across a wide range of settings.

PDF Details DOI

JBHI Journal 2026 Journal Article

DBGT-PLA: Dual-Branch Graph–Transformer Fusion for Interpretable Protein– Ligand Affinity Prediction

Ying Wang
Jing Hu
Junlin Xu
Bo Li

Protein-ligand binding affinity prediction is critical for drug discovery, yet existing methods struggle to jointly model local atomic interactions and global contextual dependencies. To address this, we propose the Interpretable Dual-Branch Graph–Transformer framework for Protein–Ligand Affinity prediction (DBGT-PLA), a novel dual-branch architecture that integrates graph neural network (GNN) with a stability-enhanced Transformer equipped with learnable positional embeddings and a NaN-filtering mechanism that handles potential Not-a-Number (NaN) values arising from numerical instability or data preprocessing. We design a Gated Residual Learning (GRL) Fusion module that performs dimension-wise adaptive integration between local graph topology and global Transformer context. This mechanism enables multi-level feature coordination through a residual path, achieving biophysically consistent alignment between atomic-level interactions and global conformational dependencies. Furthermore, we introduce an edge-level Shapley attribution framework tailored to protein–ligand interaction graphs, quantifying contributions of chemical bonds (e. g. , hydrophobic contacts) and non-covalent interactions. Experiments show DBGT-PLA reduces RMSE by 18. 3% (from 1. 522 to 1. 244 on the Holdout Set 2019), outperforming state-of-the-art models. Crucially, our explainability module reveals that the ligand edges dominate affinity predictions, accounting for nearly 70%. This work not only advances predictive accuracy but also offers unprecedented, quantitative insights into interaction determinants, which can guide rational drug optimization. The code of DBGT-PLA is publicly available at https://github.com/wangwying/DBGT-PLA

Details DOI

AAAI Conference 2026 Conference Paper

Encode Geometric Diagram as Geo-Graph in Geometry Problem Solving

Wenjun Wu
Lingling Zhang
Bo Zhao
Bo Li
Xinyu Zhang
Yaqiang Wu

Geometry Problem Solving has become a hot topic these years due to its complexity of enabling the machine with geometric abstraction, multi-modal reasoning and mathematical capabilities. Majority of research works place their attention on the fusion of multi-modal data or the synergistic combination of neural and symbolic systems for performance improvement. However, their neglect of the unique characteristics of geometric diagrams, which distinguish them from natural images, impedes the further exploring of critical information in geometric diagrams. In this work, we introduce the novel concept of geo-graph and propose the Geo-Graph Geometry Problem Solving model which encodes the geometric diagram from a new perspective. The geo-graph is designed to include semantic, structural and spatial information in the diagram, which is crucial to subsequent problem reasoning stage. To facilitate the model's comprehension of the actual layout of geometric diagram, spatial and connecting attentions are devised to serve as intrinsic knowledge guidance for feature propagation. An extra cross-modal attention is used as external guidance to instruct the encoding of geo-graph to be related to specific problem target. Fused multi-modal features are then sent into a commonly used encoder-decoder framework for final solution generation. The model is first trained with three carefully designed pre-training tasks to establish its fundamental knowledge of geo-graph, leveraging numerous varied samples generated through a geo-graph-based augmentation method. Experiments on popular geometry problem solving datasets demonstrate the effectiveness and superiority of our model for geometric diagram encoding.

PDF Details DOI

EAAI Journal 2026 Journal Article

Enhanced visual state space model for real-time wafer defect detection

Rui Sun
Dejin Zhao
Jiajian Meng
Jialin Li
Jingzhe Zhang
Bo Li
Dexin Kong
Xu Zhu

Details DOI

AAAI Conference 2026 Conference Paper

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

Ziyi Ni
Huacan Wang
Shuo Zhang
Shuo Lu
Ziyang He
WangYou
Zhenheng Tang
Sen Hu

Beyond scratch coding, exploiting large-scale code repositories (e.g., GitHub) for practical tasks is vital in real-world software development, yet current benchmarks rarely evaluate code agents in such authentic, workflow-driven scenarios. To bridge this gap, we introduce GitTaskBench, a benchmark designed to systematically assess this capability via 54 realistic tasks across 7 modalities and 7 domains. Each task pairs a relevant repository with an automated, human-curated evaluation harness specifying practical success criteria. Beyond measuring execution and task success, we also propose the alpha-value metric to quantify the economic benefit of agent performance, which integrates task success rates, token cost, and average developer salaries. Experiments across three state-of-the-art agent frameworks with multiple advanced LLMs show that leveraging code repositories for complex task solving remains challenging: even the best-performing system, OpenHands+Claude 3.7, solves only 48.15% of tasks. Error analysis attributes over half of failures to seemingly mundane yet critical steps like environment setup and dependency resolution, highlighting the need for more robust workflow management and increased timeout preparedness. By releasing GitTaskBench, we aim to drive progress and attention toward repository-aware code reasoning, execution, and deployment---moving agents closer to solving complex, end-to-end real-world tasks.

PDF Details DOI

JBHI Journal 2026 Journal Article

Hierarchical Deep Decision Tree-Based Network for Odontogenic Cystic Lesion Classification in CBCT Images

Zimo Huang
Hao Wang
Bo Li
Eduardo Delamare
Shengfu Huang
Lei Bi
Jinman Kim

Odontogenic cystic lesions (OCLs) are complex jaw abnormalities that require a precise diagnosis of the disease for treatment. Visual OCL diagnosis is commonly based on reviewing cone-beam computed tomography (CBCT) to identify morpho-pathological features associated with specific lesion types in a hierarchical manner. Current state-of-the-art methods focus on extracting features from the image without any guidance beyond the lesion diagnosis, and do not fully leverage the hierarchical relationship between the lesion diagnosis and morphological features. In this study, we propose a hierarchical deep decision tree network (H2DT-Net) with three modules: a deep decision tree-based hierarchical learning module (DHLM) to leverage inter-categorical relationships; a feature category embedding module (FCEM) to capture representations from both diagnostic and morpho-pathological domains and support the DHLM; and a lesion localised attention module (LLAM) to facilitate the feature extraction process by generating lesion-focused attention maps. Evaluated on 289 CBCT images, H2DT-Net achieved state-of-the-art performance in OCL classification. We further demonstrate that our method is effective in clinical settings, where it outperformed six maxillofacial clinicians in diagnostic assessment.

Details DOI

AAAI Conference 2026 Conference Paper

Information Elicitation Mechanisms for Bayesian Auctions (Abstract Reprint)

Jing Chen
Bo Li
Yingkai Li

In this paper we design information elicitation mechanisms for Bayesian auctions. While in Bayesian mechanism design the distributions of the players’ private types are often assumed to be common knowledge, information elicitation considers the situation where the players know the distributions better than the decision maker. To weaken the information assumption in Bayesian auctions, we consider an information structure where the knowledge about the distributions is arbitrarily scattered among the players. In such an unstructured information setting, we design mechanisms for unit-demand auctions and additive auctions that aggregate the players’ knowledge, generating revenue that are constant approximations to the optimal Bayesian mechanisms with a common prior. Our mechanisms are 2-step dominant-strategy truthful, and the approximation ratios improve gracefully with the amount of knowledge the players collectively have.

PDF Details DOI

EAAI Journal 2026 Journal Article

Intelligent image compression based on neighborhood dynamic perception and multi-parameter optimized entropy model

Bo Li
Yongjun Li
Jingyi He
Mengyan Lu
Chaoyue Li
Zhimin Chenjin
Yong Liang
Xuxing Zhao

Details DOI

AAAI Conference 2026 Conference Paper

Language Drift in Multilingual Retrieval-Augmented Generation: Characterization and Decoding-Time Mitigation

Bo Li
Zhenghua Xu
Rui Xie

Multilingual Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to perform knowledge-intensive tasks in multilingual settings by leveraging retrieved documents as external evidence. However, when the retrieved evidence differs in language from the user query and in-context exemplars, the model often exhibits language drift by generating responses in an unintended language. This phenomenon is especially pronounced during reasoning-intensive decoding, such as Chain-of-Thought (CoT) generation, where intermediate steps introduce further language instability. In this paper, we systematically study output language drift in multilingual RAG across multiple datasets, languages, and LLM backbones. Our controlled experiments reveal that the drift results not from comprehension failure but from decoder-level collapse, where dominant token distributions and high-frequency English patterns dominate the intended generation language. We further observe that English serves as a semantic attractor under cross-lingual conditions, emerging as both the strongest interference source and the most frequent fallback language. To mitigate this, we propose Soft Constrained Decoding (SCD), a lightweight, training-free decoding strategy that gently steers generation toward the target language by penalizing non-target-language tokens. SCD is model-agnostic and can be applied to any generation algorithm without modifying the architecture or requiring additional data. Experiments across three multilingual datasets and multiple typologically diverse languages show that SCD consistently improves language alignment and task performance, providing an effective and generalizable solution in multilingual RAG.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MARS: Multi-Agent Adaptive Reasoning with Socratic Guidance for Automated Prompt Optimization

Jian Zhang
Zhangqi Wang
Haiping Zhu
Kangda Cheng
Kai He
Bo Li
Qika Lin
Jun Liu

Large language models (LLMs) typically operate in a question-answering paradigm, where the quality of the input prompt critically affects the response. Automated Prompt Optimization (APO) aims to overcome the cognitive biases of manually crafted prompts and explore a broader prompt design space. However, existing APO methods often suffer from rigid template structures and inefficient exploration in the prompt space. To this end, we propose a Multi-Agent Adaptive Reasoning with Socratic guidance framework (MARS) for APO. MARS consists of five complementary agents and formulates the optimization process as a Partially Observable Markov Decision Process (POMDP), enabling adaptive prompt refinement through explicit state modeling and interactive feedback. Specifically, a Planner agent generates flexible optimization trajectories, a Teacher-Critic-Student triad engages in Socratic-style dialogue to iteratively optimize the prompt based on pseudo-gradient signals in the text space, and a Target agent executes the prompt in downstream tasks to provide performance feedback. MARS integrates reasoning, feedback, and state transition into a unified hidden-state evolution process, improving both the effectiveness and interpretability of optimization. Extensive experiments on multiple datasets demonstrate that MARS outperforms existing APO methods in terms of optimization performance, search efficiency, and interpretability.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

Bo Li
Tian Tian
Zhenghua Xu
Hao Cheng
Shikun Zhang
Wei Ye

Dynamic retrieval-augmented generation (RAG) allows large language models (LLMs) to fetch external knowledge on demand, offering greater adaptability than static RAG. A central challenge in this setting lies in determining the optimal timing for retrieval. Existing methods often trigger retrieval based on low token-level confidence, which may lead to delayed intervention after errors have already propagated. We introduce Entropy-Trend Constraint (ETC), a training-free method that determines optimal retrieval timing by modeling the dynamics of token-level uncertainty. Specifically, ETC utilizes first- and second-order differences of the entropy sequence to detect emerging uncertainty trends, enabling earlier and more precise retrieval. Experiments on six QA benchmarks with three LLM backbones demonstrate that ETC consistently outperforms strong baselines while reducing retrieval frequency. ETC is particularly effective in domain-specific scenarios, exhibiting robust generalization capabilities. Ablation studies and qualitative analyses further confirm that trend-aware uncertainty modeling yields more effective retrieval timing. The method is plug-and-play, model-agnostic, and readily integrable into existing decoding pipelines. Implementation code is included in the supplementary materials.

PDF Details DOI

TMLR Journal 2026 Journal Article

Nondeterministic Polynomial-time Problem Challenge: An Ever-Scaling Reasoning Benchmark for LLMs

Chang Yang
Ruiyu Wang
Junzhe Jiang
Qi Jiang
Qinggang Zhang
Yanchen Deng
Shuxin Li
Shuyue Hu

Reasoning is the fundamental capability of large language models (LLMs). Due to the rapid progress of LLMs, there are two main issues of current benchmarks: i) these benchmarks can be crushed in a short time (less than 1 year), and ii) these benchmarks may be easily hacked. To handle these issues, we propose the ever-scalingness for building the benchmarks which are scaling over complexity, instance, oversight and coverage. This paper presents Nondeterministic Polynomial-time Problem Challenge (NPPC), an ever-scaling reasoning benchmark for LLMs. Specifically, the NPPC has three main modules: i) npgym, which provides a unified interface of 25 well-known NP-complete problems and can generate any number of instances with any levels of complexities, ii) npsolver, which provides a unified interface to evaluate the problem instances with both online and offline models via APIs and local deployments, respectively, and iii) npeval, which provides the comprehensive and ready-to-use tools to analyze the performances of LLMs over different problems, the number of tokens, the aha moments, the reasoning errors and the solution errors. Extensive experiments over widely-used LLMs demonstrate: i) NPPC can successfully decrease the performances of advanced LLMs to below 10%, demonstrating that NPPC is not crushed by current models, ii) DeepSeek-R1, Claude-3.7-Sonnet, and o1/o3-mini are the most powerful LLMs, where DeepSeek-R1 can outperform Claude-3.7-Sonnet and o1/o3-mini in most NP-complete problems considered, and iii) the numbers of tokens, aha moments in the advanced LLMs, e.g., Claude-3.7-Sonnet and DeepSeek-R1, are observed first to increase and then decrease when the problem instances become more and more difficult. Through continuously scaling analysis, NPPC can provide critical insights into LLMs' reasoning capabilities, exposing fundamental limitations and suggesting future directions for further improvements.