Author name cluster

Long Bai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

AAAI Conference 2026 Conference Paper

EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion

Tong Chen
Xinyu Ma
Long Bai
Wenyang Wang
Yue Sun
Luping Zhou

Endoscopic images often suffer from diverse and co-occurring degradations such as low lighting, smoke, and bleeding, which obscure critical clinical details. Existing restoration methods are typically task-specific and often require prior knowledge of the degradation type, limiting their robustness in real-world clinical use. We propose EndoIR, an all-in-one, degradation-agnostic diffusion-based framework that restores multiple degradation types using a single model. EndoIR introduces a Dual-Domain Prompter that extracts joint spatial–frequency features, coupled with an adaptive embedding that encodes both shared and task-specific cues as conditioning for denoising. To mitigate feature confusion in conventional concatenation-based conditioning, we design a Dual-Stream Diffusion architecture that processes clean and degraded inputs separately, with a Rectified Fusion Block integrating them in a structured, degradation-aware manner. Furthermore, Noise-Aware Routing Block improves efficiency by dynamically selecting only noise-relevant features during denoising. Experiments on SegSTRONG-C and CEC datasets demonstrate that EndoIR achieves state-of-the-art performance across multiple degradation scenarios while using fewer parameters than strong baselines, and downstream segmentation experiments confirm its clinical utility.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion

Meng Wei
Kun Yuan
Shi Li
Yue Zhou
Long Bai
Nassir Navab
Hongliang Ren
Hong Joo Lee

Enabling intuitive, language-driven interaction with surgical scenes is a critical step toward intelligent operating rooms and autonomous surgical robotic assistance. However, the task of referring segmentation, localizing surgical instruments based on natural language descriptions, remains underexplored in surgical videos, with existing approaches struggling to generalize due to reliance on static visual cues and predefined instrument names. In this work, we introduce SurgRef, a novel motion-guided framework that grounds free-form language expressions in instrument motion, capturing how tools move and interact across time, rather than what they look like. This allows models to understand and segment instruments even under occlusion, ambiguity, or unfamiliar terminology. To train and evaluate SurgRef, we present Ref-IMotion, a diverse, multi-institutional video dataset with dense spatiotemporal masks and rich motion-centric expressions. SurgRef achieves state-of-the-art accuracy and generalization across surgical procedures, setting a new benchmark for robust, language-driven surgical video segmentation.

PDF Details DOI

EAAI Journal 2026 Journal Article

X-ray/CT image registration based on triple-cycle modal unification network

Yuanxi Sun
Chuan Tang
Xin Zhong
Xiaohong Chen
Jia Zheng
Long Bai

Accurate registration between X-ray and Computed Tomography (CT) images plays a vital role in many clinical and surgical applications. However, deep learning-based two-dimensional/three-dimensional (2D/3D) registration methods often rely on digitally reconstructed radiographs generated through ray casting, where the substantial modality gap between X-ray and Digitally Reconstructed Radiograph (DRR) images can significantly reduce registration accuracy. This study proposes a triple-cycle modality unification network that introduces an intermediate bias-neutral modality to bridge the domain discrepancy between the two modalities. Both X-ray and DRR images are transformed into this unified modality, effectively minimizing modality bias and reducing information loss associated with one-way translation. Furthermore, an efficient hybrid registration network is designed by integrating the global context modeling capability of Transformer architectures with the local feature extraction strength of convolutional neural networks, supported by a multi-scale feature extraction strategy. Experimental results demonstrate that the proposed method achieves an average rotation error of approximately 0. 5° and translation errors of about 0. 5 mm in the X and Z directions and less than 1 mm in the Y direction. These findings indicate that the proposed framework achieves high-precision registration and has strong potential for practical application in computer-assisted diagnosis and image-guided interventions.

Details DOI

AAAI Conference 2024 Conference Paper

Tree-of-Reasoning Question Decomposition for Complex Question Answering with Large Language Models

Kun Zhang
Jiali Zeng
Fandong Meng
Yuanzhuo Wang
Shiqi Sun
Long Bai
Huawei Shen
Jie Zhou

Large language models (LLMs) have recently demonstrated remarkable performance across various Natual Language Processing tasks. In the field of multi-hop reasoning, the Chain-of-thought (CoT) prompt method has emerged as a paradigm, using curated stepwise reasoning demonstrations to enhance LLM's ability to reason and produce coherent rational pathways. To ensure the accuracy, reliability, and traceability of the generated answers, many studies have incorporated information retrieval (IR) to provide LLMs with external knowledge. However, existing CoT with IR methods decomposes questions into sub-questions based on a single compositionality type, which limits their effectiveness for questions involving multiple compositionality types. Additionally, these methods suffer from inefficient retrieval, as complex questions often contain abundant information, leading to the retrieval of irrelevant information inconsistent with the query's intent. In this work, we propose a novel question decomposition framework called TRQA for multi-hop question answering, which addresses these limitations. Our framework introduces a reasoning tree (RT) to represent the structure of complex questions. It consists of four components: the Reasoning Tree Constructor (RTC), the Question Generator (QG), the Retrieval and LLM Interaction Module (RAIL), and the Answer Aggregation Module (AAM). Specifically, the RTC predicts diverse sub-question structures to construct the reasoning tree, allowing a more comprehensive representation of complex questions. The QG generates sub-questions for leaf-node in the reasoning tree, and we explore two methods for QG: prompt-based and T5-based approaches. The IR module retrieves documents aligned with sub-questions, while the LLM formulates answers based on the retrieved information. Finally, the AAM aggregates answers along the reason tree, producing a definitive response from bottom to top.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Rich Event Modeling for Script Event Prediction

Long Bai
Saiping Guan
Zixuan Li
Jiafeng Guo
Xiaolong Jin
Xueqi Cheng

Script is a kind of structured knowledge extracted from texts, which contains a sequence of events. Based on such knowledge, script event prediction aims to predict the subsequent event. To do so, two aspects should be considered for events, namely, event description (i.e., what the events should contain) and event encoding (i.e., how they should be encoded). Most existing methods describe an event by a verb together with a few core arguments (i.e., subject, object, and indirect object), which are not precise enough. In addition, existing event encoders are limited to a fixed number of arguments, which are not flexible enough to deal with extra information. Thus, in this paper, we propose the Rich Event Prediction (REP) framework for script event prediction. Fundamentally, it is based on the proposed rich event description, which enriches the existing ones with three kinds of important information, namely, the senses of verbs, extra semantic roles, and types of participants. REP contains an event extractor to extract such information from texts. Based on the extracted rich information, a predictor then selects the most probable subsequent event. The core component of the predictor is a transformer-based event encoder that integrates the above information flexibly. Experimental results on the widely used Gigaword Corpus show the effectiveness of the proposed framework.

PDF Details DOI

AIIM Journal 2023 Journal Article

Two-stage contextual transformer-based convolutional neural network for airway extraction from CT images

Yanan Wu
Shuiqing Zhao
Shouliang Qi
Jie Feng
Haowen Pang
Runsheng Chang
Long Bai
Mengqi Li

Accurate airway segmentation from computed tomography (CT) images is critical for planning navigation bronchoscopy and realizing a quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). Existing methods face difficulty in airway segmentation, particularly for the small branches of the airway. These difficulties arise due to the constraints of limited labeling and failure to meet clinical use requirements in COPD. We propose a two-stage framework with a novel 3D contextual transformer for segmenting the overall airway and small airway branches using CT images. The method consists of two training stages sharing the same modified 3D U-Net network. The novel 3D contextual transformer block is integrated into both the encoder and decoder path of the network to effectively capture contextual and long-range information. In the first training stage, the proposed network segments the overall airway with the overall airway mask. To improve the performance of the segmentation result, we generate the intrapulmonary airway branch label, and train the network to focus on producing small airway branches in the second training stage. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analyses demonstrate that our proposed method extracts significantly more branches and longer lengths of the airway tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https: //github. com/zhaozsq/airway_segmentation.

Details DOI

AAAI Conference 2020 Short Paper

Entity Type Enhanced Neural Model for Distantly Supervised Relation Extraction (Student Abstract)

Long Bai
Xiaolong Jin
Chuanzhi Zhuang
Xueqi Cheng

Distantly Supervised Relation Extraction (DSRE) has been widely studied, since it can automatically extract relations from very large corpora. However, existing DSRE methods only use little semantic information about entities, such as the information of entity type. Thus, in this paper, we propose a method for integrating entity type information into a neural network based DSRE model. It also adopts two attention mechanisms, namely, sentence attention and type attention. The former selects the representative sentences for a sentence bag, while the latter selects appropriate type information for entities. Experimental comparison with existing methods on a benchmark dataset demonstrates its merits.

PDF Details