Arrow Research search

Author name cluster

Jiang Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers
2 author rows

Possible papers

24

AAAI Conference 2026 Conference Paper

DeLightMono: Enhancing Self-Supervised Monocular Depth Estimation in Endoscopy by Decoupling Uneven Illumination

  • Mingyang Ou
  • Haojin Li
  • Yifeng Zhang
  • Ke Niu
  • Zhongxi Qiu
  • Heng Li
  • Jiang Liu

Self-supervised monocular depth estimation serves as a key task in the development of endoscopic navigation systems. However, performance degradation persists due to uneven illumination inherent in endoscopic images, particularly in low-intensity regions. Existing low-light enhancement techniques fail to effectively guide the depth network. Furthermore, solutions from other fields, like autonomous driving, require well-lit images, making them unsuitable and increasing data collection burdens. To this end, we present DeLightMono - a novel self-supervised monocular depth estimation framework with illumination decoupling. Specifically, endoscopic images are represented by a designed illumination-reflectance-depth model, and are decomposed with auxiliary networks. Moreover, a self-supervised joint-optimizing framework with novel losses leveraging the decoupled components is proposed to mitigate the effects of uneven illumination on depth estimation. The effectiveness of the proposed methods was rigorously verified through extensive comparisons and an ablation study performed on two public datasets.

TMLR Journal 2026 Journal Article

Learning from Online Videos at Inference Time for Computer-Use Agents

  • Yujian Liu
  • Ze Wang
  • Hao Chen
  • Ximeng Sun
  • Xiaodong Yu
  • Jialian Wu
  • Jiang Liu
  • Emad Barsoum

Computer-use agents can operate computers and automate laborious tasks, but despite recent rapid progress, they still lag behind human users, especially when tasks require domain-specific procedural knowledge about particular applications, platforms, and multi-step workflows. Humans can bridge this gap by watching video tutorials: we search, skim, and selectively imitate short segments that match our current subgoal. In this paper, we study how to enable computer-use agents to learn from online videos at inference time effectively. We propose a framework that retrieves and filters tutorial videos, converts them into structured demonstration trajectories, and dynamically selects trajectories as in-context guidance during execution. Particularly, using a VLM, we infer UI actions, segment videos into short subsequences of actions, and assign each subsequence a textual objective. At inference time, a two-stage selection mechanism dynamically chooses a single trajectory to add in context at each step, focusing the agent on the most helpful local guidance for its next decision. Experiments on two widely used benchmarks show that our framework consistently outperforms strong base agents and variants that use only textual tutorials or transcripts. Analyses highlight the importance of trajectory segmentation and selection, action filtering, and visual information, suggesting that abundant online videos can be systematically distilled into actionable guidance that improves computer-use agents at inference time.

JBHI Journal 2026 Journal Article

Online Bayesian Approximation Based Uncertainty Aware Model for Ophthalmic Image Segmentation

  • Yinglin Zhang
  • Risa Higashita
  • Lingxi Zeng
  • Jialin Li
  • Ruiling Xi
  • Tianhang Liu
  • Huazhu Fu
  • Dave Towey

The robust segmentation of different targets in multiple modality images is challenging due to factors such as low contrast, variations in target size and shape, and interference from diseases, which may lead to segmentation ambiguity. In addition, the assessment of the reliability of artificial intelligence is crucial for its clinical application. This paper proposes the Online Bayesian approximation based Uncertainty-aware Network (OBU-Net) for robust ophthalmic image segmentation. Our approach introduces an efficient online Bayesian method to update a spatial uncertainty map during training continuously. Then, the Spatial Uncertainty Aware Block (SUA-B) leverages the uncertainty map to localize and prioritize attention to ambiguous regions. Additionally, we extract pixel-wise confidence from multi-scale predictions to integrate hierarchical predictions. We compare OBU-Net with state-of-the-art (SOTA) methods on six datasets. The experimental results demonstrate that our method achieves the best overall performance across different modalities and segmentation tasks, highlighting the robustness of our approach. Additionally, metamorphic testing experiments were conducted, exploring the algorithm’s stability against random perturbations. Lastly, we propose an image-level uncertainty score and demonstrate its effectiveness for evaluating the model’s segmentation reliability.

AAAI Conference 2026 Conference Paper

Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking

  • Lingfeng Yao
  • Chenpei Huang
  • Shengyao Wang
  • Junpei Xue
  • Hanqing Guo
  • Jiang Liu
  • Phone Lin
  • Tomoaki Ohtsuki

As generative audio models are rapidly evolving, AI-generated audios increasingly raise concerns about copyright infringement and misinformation spread. Audio watermarking, as a proactive defense, can embed secret messages into audio for copyright protection and source verification. However, current neural audio watermarking methods focus primarily on the imperceptibility and robustness of watermarking, while ignoring its vulnerability to security attacks. In this paper, we develop a simple yet powerful attack: the overwriting attack that overwrites the legitimate audio watermark with a forged one and makes the original legitimate watermark undetectable. Based on the audio watermarking information that the adversary has, we propose three categories of overwriting attacks, i.e., white-box, gray-box, and black-box attacks. We also thoroughly evaluate the proposed attacks on state-of-the-art neural audio watermarking methods. Experimental results demonstrate that the proposed overwriting attacks can effectively compromise existing watermarking schemes across various settings and achieve a nearly 100% attack success rate. The practicality and effectiveness of the proposed overwriting attacks expose security flaws in existing neural audio watermarking systems, underscoring the need to enhance security in future audio watermarking designs.

EAAI Journal 2025 Journal Article

A heterogeneous transfer learning method for fault prediction of railway track circuit

  • Lan Na
  • Baigen Cai
  • Chongzhen Zhang
  • Jiang Liu
  • Zhengjiao Li

Prediction and identification of faults in track circuits are crucial for improving the safety and efficiency of railway transportation. However, due to the absence of real data, the task of track circuit fault prediction through deep learning methods facing significant challenges. This paper proposed a novel heterogeneous transfer learning network structure for track circuit deep learning fault prediction. The proposed transfer learning network can reduce the reliance on track circuit data in the process of deep learning models training by utilizing public datasets from other similar tasks. In this paper, an index describing the data distribution is used to demonstrate the transferability between heterogeneous data firstly. Then a heterogeneous transfer learning network structure is proposed to help the deep learning model training on the track circuit fault prediction task. Finally, the effect of transfer learning is comprehensively examined. The simulation experimental results show that the proposed heterogeneous transfer learning network structure can transfer useful knowledge in other similar fields for tasks in track circuit fault prediction, and the resulting model can distinguish between nine different classes with a high accuracy level over 99% on the test dataset while reducing the amount of required training data to 10% of the traditional training methods.

AAAI Conference 2025 Conference Paper

AIF-SFDA: Autonomous Information Filter Driven Source-Free Domain Adaptation for Medical Image Segmentation

  • Haojin Li
  • Heng Li
  • Jianyu Chen
  • Rihan Zhong
  • Ke Niu
  • Huazhu Fu
  • Jiang Liu

Decoupling domain-variant information (DVI) from domain-invariant information (DII) serves as a prominent strategy for mitigating domain shifts in the practical implementation of deep learning algorithms. However, in medical settings, concerns surrounding data collection and privacy often restrict access to both training and test data, hindering the empirical decoupling of information by existing methods. To tackle this issue, we propose an Adaptive Information Filter-driven Source-free Domain Adaptation (AIF-SFDA) algorithm, which leverages a frequency-based learnable information filter to autonomously decouple DVI and DII. Information Bottleneck (IB) and Self-supervision (SS) are incorporated to optimize the learnable frequency filter. The IB governs the information flow within the filter to diminish redundant DVI, while SS preserves DII in alignment with the specific task and image modality. Thus, the adaptive information filter can overcome domain shifts relying solely on target data. A series of experiments covering various medical image modalities and segmentation tasks were conducted to demonstrate the benefits of AIF-SFDA through comparisons with leading algorithms and ablation studies.

JBHI Journal 2025 Journal Article

AIPNet: Action-Instance Progressive Learning Network for Instrument-Tissue Interaction Detection

  • Wenjun Lin
  • Yan Hu
  • Luoying Hao
  • Huazhu Fu
  • Cheekong Chui
  • Jiang Liu

Instrument-tissue interaction detection, a task aimed at understanding surgical scenes from videos, holds immense importance in constructing computer-assisted surgery systems. Existing methods for this task consist of two stages: instance detection and interaction prediction. This sequential and separate model structure limits both effectiveness and efficiency, making it difficult to deploy on surgical robotic platforms. In this paper, we propose an end-to-end Action-Instance Progressive Learning Network (AIPNet) for the task. The model operates in three steps: action detection, instance detection, and action class refinement. Starting with coarse-scale proposals, the model progressively refines them into coarse-grained actions, which then serve as proposals for instance detection. The action prediction results are further refined using instance features through late fusion. These progressive learning processes improve the performance of the end-to-end model. Additionally, we introduce Dynamic Proposal Generators (DPG) to create dynamic adaptive learnable proposals for each video frame. To address the training challenges of this multi-task model, semantic supervised training is introduced to transfer prior language knowledge, and a training label strategy is proposed to generate unrelated instrument-tissue pair labels for enhanced supervision. Experimental results on PhacoQ and CholecQ datasets show that the proposed method achieves superior accuracy and faster processing speed than state-of-the-art models.

TMLR Journal 2025 Journal Article

DiffNat: Exploiting the Kurtosis Concentration Property for Image quality improvement

  • Aniket Roy
  • Maitreya Suin
  • Anshul Shah
  • Ketul Shah
  • Jiang Liu
  • Rama Chellappa

Diffusion models have significantly advanced generative AI in terms of creating and editing natural images. However, improving the image quality of generated images is still of paramount interest. In this context, we propose a generic kurtosis concentration (KC) loss that can be readily applied to any standard diffusion model pipeline to improve image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass filtered versions of the image. To improve the image quality of generated images, we reduce the gap between the highest and lowest kurtosis values across the band-pass filtered versions (e.g., Discrete Wavelet Transform (DWT)) of images. In addition, we also propose a novel condition-agnostic perceptual guidance strategy during inference to further improve the quality. We validate the proposed approach on four diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, (3) image super-resolution, and (4) blind face-restoration. Integrating the proposed KC loss and perceptual guidance has improved the perceptual quality in all these tasks in terms of FID, MUSIQ score, and user evaluation. Code: https://github.com/aniket004/DiffNat.git

ECAI Conference 2025 Conference Paper

Dual-Space Contrastive Learning with Abnormal Edge Suppression for Graph Anomaly Detection

  • Mark Junjie Li
  • Shiyang He
  • Jun Li
  • Jinren Li
  • Gen Zhao
  • Jiang Liu
  • Sunjie Huang

Graph Anomaly Detection identifies nodes in a graph that deviate from normal behavior and finds wide applications in finance, social networks, and cybersecurity. Recent studies focus on capturing rich contrastive information between positive and negative samples by constructing multi-view contrast patterns through data augmentation, achieving notable performance gains. Nevertheless, existing methods often suffer from abnormal information diffusion, where anomalies propagate along abnormal edges and contaminate neighboring nodes, ultimately compromising the semantic consistency between the target node and its positive subgraph. Furthermore, most existing approaches learn node representations solely in Euclidean space, limiting their ability to capture the hierarchical structure prevalent in real-world graphs. To address these challenges, we propose a novel Dual-space Contrastive Learning Framework with Abnormal Edge Suppression, named DC-AES. By incorporating hyperbolic space, our framework preserves the hierarchical structure of the graph, while the abnormal edge suppression module mitigates anomaly diffusion by filtering out anomalous edges. Extensive experiments on six real datasets demonstrate the effectiveness of our approach compared to existing SOTA methods, with a maximum improvement of 6. 63% in AUC.

EAAI Journal 2025 Journal Article

Fuzzification-back propagation neural network-based model prediction for robotic arm positioning error reduction

  • Jiang Liu
  • Jianwei Wu
  • Jiansheng Pan
  • Pengyue Zhao

Robotic arms are pervasively used in critical manufacturing fields, but its absolute positioning accuracy cannot be controlled because of random errors. Although some researches using back propagation neural network to predict the robotic arm's absolute positioning error have been proved, they suffer from poor convergence and low prediction accuracy in that input parameters contain unavoidable measurement errors. This paper proposed an error prediction model based on fuzzification and back propagation neural network. The rotation angle and direction of robotic joints are employed as training samples for the back propagation neural network, and converted into error contributions using the fuzzification to eliminate the influence of measurement errors in the input parameters. The input parameters are simplified, which enables the training process of the back propagation neural network to be optimized. Experimental results showed that the training time of the model was reduced by two times or more, and the mean square error was decreased by roughly 2. 94 %. Meanwhile, the average absolute positioning error of the robotic arm with the prediction model was reduced by 59. 22 %. The model can be easily transplanted into embedded systems to provide a methodology for new design of robotic arm error compensators.

JBHI Journal 2025 Journal Article

GlanceSeg: Real-Time Microaneurysm Lesion Segmentation With Gaze-Map-Guided Foundation Model for Early Detection of Diabetic Retinopathy

  • Hongyang Jiang
  • Mengdi Gao
  • Zirong Liu
  • Chen Tang
  • Xiaoqing Zhang
  • Shuai Jiang
  • Wu Yuan
  • Jiang Liu

Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microaneurysms (MAs), resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on SAM. GlanceSeg enables real-time segmentation of MA lesions as ophthalmologists review fundus images. Our human-in-the-loop framework integrates the ophthalmologist's gaze maps, allowing for rough localization of minute lesions in fundus images. Subsequently, a saliency map is generated based on the located region of interest, which provides prompt points to assist the foundation model in efficiently segmenting MAs. Finally, a domain knowledge filtering (DKF) module refines the segmentation of minute lesions. We conducted experiments on two newly-built public datasets, i. e. , IDRiD and Retinal-Lesions, and validated the feasibility and superiority of GlanceSeg through visualized illustrations and quantitative measures. Additionally, we demonstrated that GlanceSeg improves annotation efficiency for clinicians and further enhances segmentation performance through fine-tuning using annotations. The clinician-friendly GlanceSeg is able to segment small lesions in real-time, showing potential for clinical applications.

JBHI Journal 2025 Journal Article

Score Prior Guided Iterative Solver for Speckles Removal in Optical Coherent Tomography Images

  • Sanqian Li
  • Risa Higashita
  • Huazhu Fu
  • Bing Yang
  • Jiang Liu

Optical coherence tomography (OCT) is a widely used non-invasive imaging modality for ophthalmic diagnosis. However, the inherent speckle noise becomes the leading cause of OCT image quality, and efficient speckle removal algorithms can improve image readability and benefit automated clinical analysis. As an ill-posed inverse problem, it is of utmost importance for speckle removal to learn suitable priors. In this work, we develop a score prior guided iterative solver (SPIS) with logarithmic space to remove speckles in OCT images. Specifically, we model the posterior distribution of raw OCT images as a data consistency term and transform the speckle removal from a nonlinear into a linear inverse problem in the logarithmic domain. Subsequently, the learned prior distribution through the score function from the diffusion model is utilized as a constraint for the data consistency term into the linear inverse optimization, resulting in an iterative speckle removal procedure that alternates between the score prior predictor and the subsequent non-expansive data consistency corrector. Experimental results on the private and public OCT datasets demonstrate that the proposed SPIS has an excellent performance in speckle removal and out-of-distribution (OOD) generalization. Further downstream automatic analysis on the OCT images verifies that the proposed SPIS can benefit clinical applications.

NeurIPS Conference 2025 Conference Paper

Unleashing Hour-Scale Video Training for Long Video-Language Understanding

  • Jingyang Lin
  • Jialian Wu
  • Ximeng Sun
  • Ze Wang
  • Jiang Liu
  • Yusheng Su
  • Xiaodong Yu
  • Hao Chen

Recent long-form video-language understanding benchmarks have driven progress in video large multimodal models (Video-LMMs). However, the scarcity of well-annotated long videos has left the training of hour-long Video-LMMs underexplored. To close this gap, we present VideoMarathon, a large-scale hour-long video instruction-following dataset. This dataset includes around 9, 700 hours of long videos sourced from diverse domains, ranging from 3 to 60 minutes per video. Specifically, it contains 3. 3M high-quality QA pairs, spanning six fundamental topics: temporality, spatiality, object, action, scene, and event. Compared to existing video instruction datasets, VideoMarathon significantly extends training video durations up to 1 hour, and supports 22 diverse tasks requiring both short- and long-term video comprehension. Building on VideoMarathon, we propose Hour-LLaVA, a powerful and efficient Video-LMM for hour-scale video-language modeling. It enables hour-long video training and inference at 1-FPS sampling by leveraging a memory augmentation module, which adaptively integrates question-relevant and spatiotemporally informative semantics from the cached full video context. In our experiments, Hour-LLaVA achieves the best performance on multiple representative long video-language benchmarks, demonstrating the high quality of the VideoMarathon dataset and the superiority of the Hour-LLaVA model.

NeurIPS Conference 2024 Conference Paper

Accelerating Non-Maximum Suppression: A Graph Theory Perspective

  • King-Siong Si
  • Lu Sun
  • Weizhan Zhang
  • Tieliang Gong
  • Jiahao Wang
  • Jiang Liu
  • Hao Sun

Non-maximum suppression (NMS) is an indispensable post-processing step in object detection. With the continuous optimization of network models, NMS has become the ``last mile'' to enhance the efficiency of object detection. This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure. Consequently, we propose two optimization methods, namely QSI-NMS and BOE-NMS. The former is a fast recursive divide-and-conquer algorithm with negligible mAP loss, and its extended version (eQSI-NMS) achieves optimal complexity of $\mathcal{O}(n\log n)$. The latter, concentrating on the locality of NMS, achieves an optimization at a constant level without an mAP loss penalty. Moreover, to facilitate rapid evaluation of NMS methods for researchers, we introduce NMS-Bench, the first benchmark designed to comprehensively assess various NMS methods. Taking the YOLOv8-N model on MS COCO 2017 as the benchmark setup, our method QSI-NMS provides $6. 2\times$ speed of original NMS on the benchmark, with a $0. 1\%$ decrease in mAP. The optimal eQSI-NMS, with only a $0. 3\%$ mAP decrease, achieves $10. 7\times$ speed. Meanwhile, BOE-NMS exhibits $5. 1\times$ speed with no compromise in mAP.

AIIM Journal 2024 Journal Article

Efficient pyramid channel attention network for pathological myopia recognition with pretraining-and-finetuning

  • Xiaoqing Zhang
  • Jilu Zhao
  • Yan Li
  • Hao Wu
  • Xiangtian Zhou
  • Jiang Liu

Pathological myopia (PM) is the leading ocular disease for impaired vision worldwide. Clinically, the characteristics of pathology distribution in PM are global-local on the fundus image, which plays a significant role in assisting clinicians in diagnosing PM. However, most existing deep neural networks focused on designing complex architectures but rarely explored the pathology distribution prior of PM. To tackle this issue, we propose an efficient pyramid channel attention (EPCA) module, which fully leverages the potential of the clinical pathology prior of PM with pyramid pooling and multi-scale context fusion. Then, we construct EPCA-Net for automatic PM recognition based on fundus images by stacking a sequence of EPCA modules. Moreover, motivated by the recent pretraining-and-finetuning paradigm, we attempt to adapt pre-trained natural image models for PM recognition by freezing them and treating the EPCA and other attention modules as adapters. In addition, we construct a PM recognition benchmark termed PM-fundus by collecting fundus images of PM from publicly available datasets. The comprehensive experiments demonstrate the superiority of EPCA-Net over state-of-the-art methods in the PM recognition task. For example, EPCA-Net achieves 97. 56% accuracy and outperforms ViT by 2. 85% accuracy on the PM-fundus dataset. The results also show that our method based on the pretraining-and-finetuning paradigm achieves competitive performance through comparisons to part of previous methods based on traditional fine-tuning paradigm with fewer tunable parameters, which has the potential to leverage more natural image foundation models to address the PM recognition task in limited medical data regime.

AAAI Conference 2024 Conference Paper

Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery

  • Jialu Zhang
  • Xiaoying Yang
  • Wentao He
  • Jianfeng Ren
  • Qian Zhang
  • Yitian Zhao
  • Ruibin Bai
  • Xiangjian He

Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. Specifically, a set of patches potentially containing objects are first generated. A set of rewards measuring the localization accuracy, the accuracy of predicted labels, and the scale consistency among nearby patches are designed in the agent to guide the scale optimization. The proposed scale-consistency reward ensures similar scales for neighboring objects of the same category. Furthermore, a spatial-semantic attention mechanism is designed to exploit the spatial semantic relations between patches. The agent employs the proximal policy optimization strategy in conjunction with the evolutionary strategy, effectively utilizing both the current patch status and historical experience embedded in the agent. The proposed model is compared with state-of-the-art methods on two benchmark datasets for object detection on drone imagery. It significantly outperforms all the compared methods. Code is available at https://github.com/UNNC-CV/EvOD/.

AIIM Journal 2024 Journal Article

Value function assessment to different RL algorithms for heparin treatment policy of patients with sepsis in ICU

  • Jiang Liu
  • Yihao Xie
  • Xin Shu
  • Yuwen Chen
  • Yizhu Sun
  • Kunhua Zhong
  • Hao Liang
  • Yujie Li

Heparin is a critical aspect of managing sepsis after abdominal surgery, which can improve microcirculation, protect organ function, and reduce mortality. However, there is no clinical evidence to support decision-making for heparin dosage. This paper proposes a model called SOFA-MDP, which utilizes SOFA scores as states of MDP, to investigate clinic policies. Different algorithms provide different value functions, making it challenging to determine which value function is more reliable. Due to ethical restrictions, we cannot test all policies on patients. To address this issue, we proposed two value function assessment methods: action similarity rate and relative gain. We experimented with heparin treatment policies for sepsis patients after abdominal surgery using MIMIC-IV. In the experiments, TD ( 0 ) shows the most reliable performance. Using the action similarity rate and relative gain to assess AI policy from TD ( 0 ), the agreement rates between AI policy and “good” physician’s actual treatment are 64. 6% and 73. 2%, while the agreement rates between AI policy and “bad” physician’s actual treatment are 44. 1% and 35. 8%, the gaps are 20. 5% and 37. 4%, respectively. External validation using action similarity rate and relative gain based on eICU resulted in agreement rates of 61. 5% and 69. 1% with the “good” physician’s treatment, and 45. 2% and 38. 3% with the “bad” physician’s treatment, with gaps of 16. 3% and 30. 8%, respectively. In conclusion, the model provides instructive support for clinical decisions, and the evaluation methods accurately distinguish reliable and unreasonable outcomes.

JBHI Journal 2023 Journal Article

Multi-Learner Based Deep Meta-Learning for Few-Shot Medical Image Classification

  • Hongyang Jiang
  • Mengdi Gao
  • Heng Li
  • Richu Jin
  • Hanpei Miao
  • Jiang Liu

Few-shot learning (FSL) is promising in the field of medical image analysis due to high cost of establishing high-quality medical datasets. Many FSL approaches have been proposed in natural image scenes. However, present FSL methods are rarely evaluated on medical images and the FSL technology applicable to medical scenarios need to be further developed. Meta-learning has supplied an optional framework to address the challenging FSL setting. In this paper, we propose a novel multi-learner based FSL method for multiple medical image classification tasks, combining meta-learning with transfer-learning and metric-learning. Our designed model is composed of three learners, including auto-encoder, metric-learner and task-learner. In transfer-learning, all the learners are trained on the base classes. In the ensuing meta-learning, we leverage multiple novel tasks to fine-tune the metric-learner and task-learner in order to fast adapt to unseen tasks. Moreover, to further boost the learning efficiency of our model, we devised real-time data augmentation and dynamic Gaussian disturbance soft label (GDSL) scheme as effective generalization strategies of few-shot classification tasks. We have conducted experiments for three-class few-shot classification tasks on three newly-built challenging medical benchmarks, BLOOD, PATH and CHEST. Extensive comparisons to related works validated that our method achieved top performance both on homogeneous medical datasets and cross-domain datasets.

AAAI Conference 2022 Conference Paper

Unified Named Entity Recognition as Word-Word Relation Classification

  • Jingye Li
  • Hao Fei
  • Jiang Liu
  • Shengqiong Wu
  • Meishan Zhang
  • Chong Teng
  • Donghong Ji
  • Fei Li

So far, named entity recognition (NER) has been involved with three major types, including flat, overlapped (aka. nested), and discontinuous NER, which have mostly been studied individually. Recently, a growing interest has been built for unified NER, tackling the above three jobs concurrently with one single model. Current best-performing methods mainly include span-based and sequence-to-sequence models, where unfortunately the former merely focus on boundary identification and the latter may suffer from exposure bias. In this work, we present a novel alternative by modeling the unified NER as word-word relation classification, namely W2 NER. The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words with Next-Neighboring-Word (NNW) and Tail-Head-Word-* (THW-*) relations. Based on the W2 NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs. We then propose multi-granularity 2D convolutions for better refining the grid representations. Finally, a co-predictor is used to sufficiently reason the word-word relations. We perform extensive experiments on 14 widely-used benchmark datasets for flat, overlapped, and discontinuous NER (8 English and 6 Chinese datasets), where our model beats all the current top-performing baselines, pushing the state-of-the-art performances of unified NER.

JBHI Journal 2021 Journal Article

Combating Ambiguity for Hash-Code Learning in Medical Instance Retrieval

  • Jiansheng Fang
  • Huazhu Fu
  • Dan Zeng
  • Xiao Yan
  • Yuguang Yan
  • Jiang Liu

When encountering a dubious diagnostic case, medical instance retrieval can help radiologists make evidence-based diagnoses by finding images containing instances similar to a query case from a large image database. The similarity between the query case and retrieved similar cases is determined by visual features extracted from pathologically abnormal regions. However, the manifestation of these regions often lacks specificity, i. e. , different diseases can have the same manifestation, and different manifestations may occur at different stages of the same disease. To combat the manifestation ambiguity in medical instance retrieval, we propose a novel deep framework called Y-Net, encoding images into compact hash-codes generated from convolutional features by feature aggregation. Y-Net can learn highly discriminative convolutional features by unifying the pixel-wise segmentation loss and classification loss. The segmentation loss allows exploring subtle spatial differences for good spatial-discriminability while the classification loss utilizes class-aware semantic information for good semantic-separability. As a result, Y-Net can enhance the visual features in pathologically abnormal regions and suppress the disturbing of the background during model training, which could effectively embed discriminative features into the hash-codes in the retrieval stage. Extensive experiments on two medical image datasets demonstrate that Y-Net can alleviate the ambiguity of pathologically abnormal regions and its retrieval performance outperforms the state-of-the-art method by an average of 9. 27% on the returned list of 10.

JBHI Journal 2020 Journal Article

Automatic Segmentation and Visualization of Choroid in OCT with Knowledge Infused Deep Learning

  • Huihong Zhang
  • Jianlong Yang
  • Kang Zhou
  • Fei Li
  • Yan Hu
  • Yitian Zhao
  • Ce Zheng
  • Xiulan Zhang

The choroid provides oxygen and nourishment to the outer retina thus is related to the pathology of various ocular diseases. Optical coherence tomography (OCT) is advantageous in visualizing and quantifying the choroid in vivo. However, its application in the study of the choroid is still limited for two reasons. (1) The lower boundary of the choroid (choroid-sclera interface) in OCT is fuzzy, which makes the automatic segmentation difficult and inaccurate. (2) The visualization of the choroid is hindered by the vessel shadows from the superficial layers of the inner retina. In this paper, we propose to incorporate medical and imaging prior knowledge with deep learning to address these two problems. We propose a biomarker-infused global-to-local network (Bio-Net) for the choroid segmentation, which not only regularizes the segmentation via predicted choroid thickness, but also leverages a global-to-local segmentation strategy to provide global structure information and suppress overfitting. For eliminating the retinal vessel shadows, we propose a deep-learning pipeline, which firstly locate the shadows using their projection on the retinal pigment epithelium layer, then the contents of the choroidal vasculature at the shadow locations are predicted with an edge-to-texture generative adversarial inpainting network. The results show our method outperforms the existing methods on both tasks. We further apply the proposed method in a clinical prospective study for understanding the pathology of glaucoma, which demonstrates its capacity in detecting the structure and vascular changes of the choroid related to the elevation of intra-ocular pressure.

AIIM Journal 2020 Journal Article

Speckle reduction of OCT via super resolution reconstruction and its application on retinal layer segmentation

  • Qifeng Yan
  • Bang Chen
  • Yan Hu
  • Jun Cheng
  • Yan Gong
  • Jianlong Yang
  • Jiang Liu
  • Yitian Zhao

Optical coherence tomography (OCT) is a rapidly developing non-invasive three dimensional imaging approach, and it has been widely used in examination and diagnosis of eye diseases. However, speckle noise are often inherited from image acquisition process, and may obscure the anatomical structure, such as the retinal layers. In this paper, we propose a novel method to reduce the speckle noise in 3D OCT scans, by introducing a new super-resolution approach. It uses a multi-frame fusion mechanism that merges multiple scans for the same scene, and utilizes the movements of sub-pixels to recover missing signals in one pixel, which significantly improves the image quality. To evaluate the effectiveness of the proposed speckle noise reduction method, we have applied it for the application of retinal layer segmentation. Results show that the proposed method has produced promising enhancement performance, and enable deep learning-based methods to obtain more accurate retinal layer segmentation results.

JBHI Journal 2018 Journal Article

Left Atrial Appendage Segmentation Using Fully Convolutional Neural Networks and Modified Three-Dimensional Conditional Random Fields

  • Cheng Jin
  • Jianjiang Feng
  • Lei Wang
  • Heng Yu
  • Jiang Liu
  • Jiwen Lu
  • Jie Zhou

Thrombosis has become a global disease threatening human health. The left atrial appendage (LAA) is a major source of thrombosis in patients with atrial fibrillation (AF). Positive correlation exists between LAA volume and AF risk. LAA morphology has been suggested to influence thromboembolic risk in AF patients and to help predict thromboembolic events in low-risk patient groups. Automatic segmentation of LAA can greatly help physicians diagnose AF. In consideration of the large anatomical variations of the LAA, we proposed a robust method for automatic LAA segmentation on computed tomographic angiography (CTA) data using fully convolutional neural networks with three-dimensional (3–D) conditional random fields (CRFs). After manual localization of ROI of LAA, we adopted the FCN in natural image segmentation and transferred their learned models by fine-tuning the networks to segment each 2–D LAA slice. Subsequently, we used a modified dense 3–D CRF that accounts for the 3–D spatial information and larger contextual information to refine the segmentations of all slices. Our method was evaluated on 150 sets of CTA data using five-fold cross validation. Compared with manual annotation, we obtained a mean dice overlap of $\text{94. 76}\%$ and a mean volume overlap of $\text{91. 10}\%$ with a computation time of less than 40 s per volume. Experimental results demonstrated the robustness of our method in dealing with large anatomical variations and computational efficiency for adoption in a daily clinical routine.)

AAAI Conference 2016 Conference Paper

Two-Stream Contextualized CNN for Fine-Grained Image Classification

  • Jiang Liu
  • Chenqiang Gao
  • Deyu Meng
  • Wangmeng Zuo

Human’s cognition system prompts that context information provides potentially powerful clue while recognizing objects. However, for fine-grained image classification, the contribution of context may vary over different images, and sometimes the context even confuses the classification result. To alleviate this problem, in our work, we develop a novel approach, two-stream contextualized Convolutional Neural Network, which provides a simple but efficient contextcontent joint classification model under deep learning framework. The network merely requires the raw image and a coarse segmentation as input to extract both content and context features without need of human interaction. Moreover, our network adopts a weighted fusion scheme to combine the content and the context classifiers, while a subnetwork is introduced to adaptively determine the weight for each image. According to our experiments on public datasets, our approach achieves considerable high recognition accuracy without any tedious human’s involvements, as compared with the state-of-the-art approaches.