Arrow Research search

Author name cluster

Jie Wen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

45 papers
2 author rows

Possible papers

45

AAAI Conference 2026 Conference Paper

Detecting Fake News in Short Videos Through Multi-View Aggregation

  • Nuo Li
  • Yuan Xiong
  • Chengliang Liu
  • Jie Wen
  • Chao Huang

The increasing prominence of short video platforms has positioned them as a primary channel for public awareness of current events, while also facilitating the widespread dissemination of fake news, thus highlighting the critical need for automated detection technologies. In contrast to fake news confined to text and images, short video news encompasses multiple modalities and extensive information, presenting heightened challenges. Most existing research emphasizes the analysis of news content or user comments alone, while overlooking the crucial role of publishers, leading to poor model performance when handling fake news lacking obvious false signals. Therefore, we propose a Publisher Profiling Module to identify new false signals. To enable a more comprehensive detection of misinformation, we design a Multi-View Aggregation (MVA) model, simultaneously evaluating news from three distinct perspectives: sentiment analysis, content understanding, and publisher profiling. Late fusion is applied at the decision level to leverage the complementary strengths of these perspectives, addressing the limitations of single-view methods. Our experiments conducted on the FakeSV and FVC datasets demonstrate the superior performance of the proposed method.

AAAI Conference 2026 Conference Paper

Incomplete Multi-view Diabetic Retinopathy Grading via Self-Supervised Inter- and Intra-View Restoration

  • Zhihao Wu
  • Yuxin Lin
  • Jie Wen
  • Wuzhen Shi
  • Linlin Shen

Multi-view diabetic retinopathy (DR) grading has achieved remarkable performance by capturing more comprehensive pathological features than single-view methods. However, complete multi-view fundus images are often difficult to obtain in clinical practice, and the performance degrades significantly when fewer views are available. To overcome this limitation, we propose the first incomplete multi-view DR grading framework, aiming to provide accurate diagnosis regardless of the number of available views. It introduces two novel modules. First, cross-view spatial correlation attention (CSCA) captures region correlations across views, automatically identifying and fusing diagnostically relevant spatial features to improve feature representation. Second, self-supervised mask consistency learning (SMCL) formulates a novel pretext task of missing-view information reconstruction by strategically masking inter- and intra-view regions, enabling the model to infer complete features from incomplete views. Benefiting from CSCA and SMCL, our method enhances structural feature consistency across views and effectively compensates for missing information during DR grading. Extensive experiments demonstrate that our method achieves state-of-the-art grading performance, particularly under realistic conditions where some views are unavailable.

AAAI Conference 2026 Conference Paper

Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval

  • Tianle Hu
  • Weijun Lv
  • Na Han
  • Xiaozhao Fang
  • Jie Wen
  • Jiaxing Li
  • Guoxu Zhou

Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic alignment and excessively pursuing pair-wise sample alignment; 2) lacking either pseudo-label reliability consideration or geometric guidance for assessing label correctness; 3) directly quantizing original features affected by domain shift, undermining the quality of learned hash codes. In view of these limitations, we propose Prototype-based Semantic Consistency Alignment (PSCA), a two-stage framework for effective domain adaptive retrieval. In the first stage, a set of orthogonal prototypes directly establishes class-level semantic connections, maximizing inter-class separability while gathering intra-class samples. During the prototype learning, geometric proximity provides a reliability indicator for semantic consistency alignment through adaptive weighting of pseudo-label confidences. The resulting membership matrix and prototypes facilitate feature reconstruction, ensuring quantization on reconstructed rather than original features, thereby improving subsequent hash coding quality and seamlessly connecting both stages. In the second stage, domain-specific quantization functions process the reconstructed features under mutual approximation constraints, generating unified binary hash codes across domains. Extensive experiments validate PSCA's superior performance across multiple datasets.

AAAI Conference 2026 Conference Paper

Quality-aware and Soft Consistency Driven Representation Fusion for Incomplete Multi-view Multi-label Classification

  • Yadong Liu
  • Waikeung Wong
  • Yulong Chen
  • Jie Wen

Multi-view multi-label classification aims to utilize the rich information contained in multiple views for accurate classification. However, in real-world applications, its performance is often severely constrained by the concurrent missingness of both views and labels. To address this problem, this paper first targets the drawback of representation degradation in traditional feature disentanglement methods caused by strong consistency constraints and proposes a soft consistency constraint. This constraint not only effectively aligns the shared information and maximally avoids the compression of information beneficial to the classification task, but it also enhances the aggregation effect of high-quality representations on other representations. Furthermore, to address the coarse-grained problem of traditional fusion strategies, we designed a quality assessment network that achieves instance-level dynamic weighted fusion in a data-driven manner. Extensive experiments on multiple benchmark datasets demonstrate that our method achieves state-of-the-art performance in both incomplete and complete data scenarios, showcasing its robustness and generality.

NeurIPS Conference 2025 Conference Paper

A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

  • Zhixiao Wu
  • Yao Lu
  • Jie Wen
  • Hao Sun
  • Qi Zhou
  • Guangming Lu

Poison-only Clean-label Backdoor Attacks (PCBAs) aim to covertly inject attacker-desired behavior into DNNs by merely poisoning the dataset without changing the labels. To effectively implant a backdoor, multiple triggers are proposed for various attack requirements of Attack Success Rate (ASR) and stealthiness. Additionally, sample selection enhances clean-label backdoor attacks' ASR by meticulously selecting "hard'' samples instead of random samples to poison. Current methods, however, 1) usually handle the sample selection and triggers in isolation, leading to severely limited improvements on both ASR and stealthiness. Consequently, attacks exhibit unsatisfactory performance on evaluation metrics when converted to PCBAs via a mere stacking of methods. Therefore, we seek to explore the bi-directional collaborative relations between the sample selection and triggers to address the above dilemma. 2) Since the strong specificity within triggers, the simple combination of sample selection and triggers fails to substantially enhance both evaluation metrics, with generalization preserved among various attacks. Therefore, we seek to propose a set of components to significantly improve both stealthiness and ASR based on the commonalities of attacks. Specifically, Component A ascertains two critical selection factors, and then makes them an appropriate combination based on the trigger scale to select more reasonable "hard'' samples for improving ASR. Component B is proposed to select samples with similarities to relevant trigger implanted samples to promote stealthiness. Component C reassigns trigger poisoning intensity on RGB colors through distinct sensitivity of the human visual system to RGB for higher ASR, with stealthiness ensured by sample selection including Component B. Furthermore, all components can be strategically integrated into diverse PCBAs, enabling tailored solutions that balance ASR and stealthiness enhancement for specific attack requirements. Extensive experiments demonstrate the superiority of our components in stealthiness, ASR, and generalization. Our code will be released as soon as possible.

IJCAI Conference 2025 Conference Paper

AdaptPFL: Unlocking Cross-Device Palmprint Recognition via Adaptive Personalized Federated Learning with Feature Decoupling

  • Zirui Zhang
  • Donghai Guan
  • Çetin Kaya Koç
  • Jie Wen
  • Qi Zhu

Contactless palmprint recognition has recently emerged as a promising biometric technology. However, traditional methods that require sharing user data introduce substantial security risks. While federated learning offers privacy-preserving solutions, it often compromises recognition accuracy due to feature distribution drift caused by external factors such as lighting and devices. To address this issue, we propose an adaptive personalized federated learning framework (AdaptPFL). The central innovation lies in decomposing palmprint features into identity-related and contextual-related components using a feature decoupling mechanism. This design isolates the influence of external environmental factors on identity recognition through de-entanglement. Furthermore, two adaptive aggregation strategies are introduced to correct client drift: (1) Intra-Local Adaptive Aggregation (ILAA), which addresses intra-client drift by adaptively combining the two decoupled feature types; (2) Global-Local Adaptive Aggregation (GLAA), which corrects inter-client drift by adaptively aggregating model parameters. Experimental results demonstrate that AdaptPFL achieves superior performance compared to existing state-of-the-art methods.

AAAI Conference 2025 Conference Paper

ALRMR-GEC: Adjusting Learning Rate Based on Memory Rate to Optimize the Edit Scorer for Grammatical Error Correction

  • Zhixiao Wu
  • Yao Lu
  • Jie Wen
  • Guangming Lu

Edit-based approaches for Grammatical Error Correction (GEC) have attracted volume attention due to their outstanding explanations of the correction process and rapid inference. Through exploring the characteristics of the generalized and specific knowledge learning for GEC, we discover that efficiently training GEC systems with satisfactory generalization capacity prefers more generalized knowledge rather than specific knowledge. Current gradient-based methods for training GEC systems, however, usually prioritize minimizing training loss over generalization loss. This paper proposes the strategy of Adjusting Learning Rate Based on Mermory Rate to optimize the edit-based GEC scorer (ALRMR-GEC). Specifically, we introduce the memory rate, a novel metric, to provide an explicit indicator for the model’s state of learning generalized and specific knowledge, which can effectively guide the GEC system to adjust the learning rate timely. Extensive experiments, conducted by optimizing the published edit scorer on the BEA2019 dataset, have shown our ALRMR-GEC significantly enhances the model generalization ability with stable and satisfactory performance nearly irrespective of the initial learning rate selection. Also, our method can accelerate the training over tenfold faster in certain cases. Finally, the experiments indicate the memory rate introduced in our ALRMR-GEC guides the GEC editscorer to learn more generalized knowledge.

NeurIPS Conference 2025 Conference Paper

Confidence-Aware With Prototype Alignment for Partial Multi-label Learning

  • Weijun Lv
  • Yu Chen
  • Xiaozhao Fang
  • Xuhuan Zhu
  • Jie Wen
  • Guoxu Zhou
  • Sixian Chan

Label prototype learning has emerged as an effective paradigm in Partial Multi-Label Learning (PML), providing a distinctive framework for modeling structured representations of label semantics while naturally filtering noise through prototype-based label confidence estimation. However, existing prototype-based methods face a critical limitation: class prototypes are the biased estimates due to noisy candidate labels, particularly when positive samples are scarce. To this end, we first propose a mutually class prototype alignment strategy bypassing noise interference by introducing two different transformation matrices, which makes the class prototypes learned by the fuzzy clustering and candidate label set mutually alignment for correcting themselves. Such alignment is also passed on to the fuzzy memberships label in turn. In addition, to eliminate noise interference in the candidate label set during the classifier learning, we use the learned permutation matrix to transform the fuzzy memberships label for learning a label reliability indicator matrix accompanied by the candidate label set. This makes the label reliability indicator matrix absolutely prevent the occurrence of numerical values located in non-label and simultaneously eliminate the introduction of incorrect label as much as possible. The resulting indicator matrix guides a robust multi-label classifier training process, jointly optimizing label confidence and classifier parameters. Extensive experiments demonstrate that our proposed model exhibits significant performance advantages over state-of-the-art PML approaches.

AAAI Conference 2025 Conference Paper

Deep Hierarchies and Invariant Disease-Indicative Feature Learning for Computer Aided Diagnosis of Multiple Fundus Diseases

  • Yuxin Lin
  • Wei Wang
  • Xiaoling Luo
  • Zhihao Wu
  • Chengliang Liu
  • Jie Wen
  • Yong Xu

With the advancement of computer vision, numerous models have been proposed for screening of fundus diseases. However, the recognition of multiple fundus diseases is often hampered by the simultaneous presence of multiple disease types and the confluence of lesion types in fundus images. This paper addresses these challenges by conceptualizing them as multi-level feature fusion and self-supervised disease-indicative feature learning problems. We decode fundus images at various levels of granularity to delineate scenarios wherein multiple diseases and lesions co-occur. To effectively integrate these features, we introduce a hierarchical vision transformer (HVT) that adeptly captures both inter-level and intra-level dependencies. A novel forward-attention module is proposed to enhance the integration of lower-level semantic information into higher semantic layers, thereby enriching the representation of complex features. Additionally, we introduce a novel self-supervised mask-consistent feature learner (MCFL). Unlike traditional mask-autoencoders that reconstruct original images using encoder-decoder structures, MCFL utilizes a teacher-student framework to reconstruct mask-consistent feature maps. In this setup, exponential moving averaging is employed to derive classification-guided features, serving as labels for reconstruction rather than merely reconstructing the original images. This innovative approach facilitates the extraction of disease-indicative features. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art models.

IJCAI Conference 2025 Conference Paper

Deep Opinion-Unaware Blind Image Quality Assessment by Learning and Adapting from Multiple Annotators

  • Zhihua Wang
  • Xuelin Liu
  • Jiebin Yan
  • Jie Wen
  • Wei Wang
  • Chao Huang

Existing deep neural network (DNN)-based blind image quality assessment (BIQA) methods primarily rely on human-rated datasets for training. However, collecting human labels is extremely time-consuming and labor-intensive, posing a significant bottleneck for practical applications. To address this challenge, we propose a Deep opinion-Unaware BIQA model by learning and adapting from Multiple Annotators, termed DUBMA, thereby eliminating the need for human annotations. Specifically, we first generate a large-scale set of distorted image pairs and then assign relative quality rankings using existing full-reference IQA models. The resulting dataset is subsequently employed for training our DUBMA. Due to the inherent discrepancies between synthetic and real-world distortions, a domain shift may occur. To address this, we propose an outlier-robust unsupervised domain adaptation approach leveraging optimal transport. This strategy effectively reduces the gap between synthetic and real-world distortion domains, thereby boosting the model’s adaptability and overall performance. Extensive experiments show that DUBMA outperforms existing opinion-unaware BIQA methods in terms of prediction accuracy across multiple datasets.

AAAI Conference 2025 Conference Paper

DiffusionREC: Diffusion Model with Adaptive Condition for Referring Expression Comprehension

  • Jingcheng Ke
  • Waikeung Wong
  • Jia Wang
  • Mu Li
  • Lunke Fei
  • Jie Wen

The objective of referring expression comprehension (REC) is to accurately identify the object in an image described by a given expression. Existing REC methods, including transformer-based and graph-based approaches among others, have shown robust performance in REC tasks. In this study, we present a groundbreaking framework named DiffusionREC for REC task. This framework reimagines the REC task as a text guided bounding box denoising diffusion process, through which noisy bounding boxes are refined and distilled to pinpoint the target box. Throughout the training process, the bounding box of the target object diffuses from its ground-truth position towards a random distribution. Simultaneously, a filtering-based object decoder is introduced to reverse this diffusion of noise, conditional on the provided expression, the result from previous denoised step and the interaction between the expression and the image. At the inference stage, we begin by randomly generating a collection of boxes. Subsequently, the filtering-based object decoder is iteratively employed to refine and prune these bounding boxes, taking into account the conditions on the given expression, the results from the previous denoised step, and the interaction between the expression and the image. Extensive experiments conducted on six datasets demonstrate that DiffusionREC outperforms previous REC methods, yielding superior performances.

IJCAI Conference 2025 Conference Paper

Enhancing Multimodal Protein Function Prediction Through Dual-Branch Dynamic Selection with Reconstructive Pre-Training

  • Xiaoling Luo
  • Peng Chen
  • Chengliang Liu
  • Xiaopeng Jin
  • Jie Wen
  • Yumeng Liu
  • Junsong Wang

Multimodal protein features play a crucial role in protein function prediction. However, these features encompass a wide range of information, ranging from structural data and sequence features to protein attributes and interaction networks, making it challenging to decipher their complex interconnections. In this work, we propose a multimodal protein function prediction method (DSRPGO) by utilizing dynamic selection and reconstructive pre-training mechanisms. To acquire complex protein information, we introduce reconstructive pre-training to mine more fine-grained information with low semantic levels. Moreover, we put forward the Bidirectional Interaction Module (BInM) to facilitate interactive learning among multimodal features. Additionally, to address the difficulty of hierarchical multi-label classification in this task, a Dynamic Selection Module (DSM) is designed to select the feature representation that is most conducive to current protein function prediction. Our proposed DSRPGO model improves significantly in BPO, MFO, and CCO on human datasets, thereby outperforming other benchmark models.

ICML Conference 2025 Conference Paper

Federated Incomplete Multi-view Clustering with Globally Fused Graph Guidance

  • Guoqing Chao
  • Zhenghao Zhang
  • Lei Meng
  • Jie Wen
  • Dianhui Chu

Federated multi-view clustering has been proposed to mine the valuable information within multi-view data distributed across different devices and has achieved impressive results while preserving the privacy. Despite great progress, most federated multi-view clustering methods only used global pseudo-labels to guide the downstream clustering process and failed to exploit the global information when extracting features. In addition, missing data problem in federated multi-view clustering task is less explored. To address these problems, we propose a novel Federated Incomplete Multi-view Clustering method with globally Fused Graph guidance (FIMCFG). Specifically, we designed a dual-head graph convolutional encoder at each client to extract two kinds of underlying features containing global and view-specific information. Subsequently, under the guidance of the fused graph, the two underlying features are fused into high-level features, based on which clustering is conducted under the supervision of pseudo-labeling. Finally, the high-level features are uploaded to the server to refine the graph fusion and pseudo-labeling computation. Extensive experimental results demonstrate the effectiveness and superiority of FIMCFG. Our code is publicly available at https: //github. com/PaddiHunter/FIMCFG.

AAAI Conference 2025 Conference Paper

Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt

  • Benfeng Wang
  • Chao Huang
  • Jie Wen
  • Wei Wang
  • Yabo Liu
  • Yong Xu

Video anomaly detection (VAD) aims at locating the abnormal events in videos. Recently, the Weakly Supervised VAD has made great progress, which only requires video-level annotations when training. In practical applications, different institutions may have different types of abnormal videos. However, the abnormal videos cannot be circulated on the internet due to privacy protection. To train a more generalized anomaly detector that can identify various anomalies, it is reasonable to introduce federated learning into WSVAD. In this paper, we propose Global and Local Context-driven Federated Learning, a new paradigm for privacy protected weakly supervised video anomaly detection. Specifically, we utilize the vision-language association of CLIP to detect whether the video frame is abnormal. Instead of leveraging handcrafted text prompts for CLIP, we propose a text prompt generator. The generated prompt is simultaneously influenced by text and visual. On the one hand, the text provides global context related to anomaly, which improves the model's ability of generalization. On the other hand, the visual provides personalized local context because different clients may have videos with different types of anomalies or scenes. The generated prompt ensures global generalization while processing personalized data from different clients. Extensive experiments show that the proposed method achieves remarkable performance.

NeurIPS Conference 2025 Conference Paper

Hierarchical Information Aggregation for Incomplete Multimodal Alzheimer's Disease Diagnosis

  • Chengliang Liu
  • Que Yuanxi
  • Qihao Xu
  • Yabo Liu
  • Jie Wen
  • Jinghua Wang
  • Xiaoling Luo

Alzheimer's Disease (AD) poses a significant health threat to the aging population, underscoring the critical need for early diagnosis to delay disease progression and improve patient quality of life. Recent advances in heterogeneous multimodal artificial intelligence (AI) have facilitated comprehensive joint diagnosis, yet practical clinical scenarios frequently encounter incomplete modalities due to factors like high acquisition costs or radiation risks. Moreover, traditional convolution-based architecture face inherent limitations in capturing long-range dependencies and handling heterogeneous medical data efficiently. To address these challenges, in our proposed heterogeneous multimodal diagnostic framework (HAD), we develop a multi-view Hilbert curve-based Mamba block and a hierarchical spatial feature extraction module to simultaneously capture local spatial features and global dependencies, effectively alleviating spatial discontinuities introduced by voxel serialization. Furthermore, to balance semantic consistency and modal specificity, we build a unified mutual information learning objective in the heterogeneous multimodal embedding space, which maintains effective learning of modality-specific information to avoid modality collapse caused by model preference. Extensive experiments demonstrate that our HAD significantly outperforms state-of-the-art methods in various modality-missing scenarios, providing an efficient and reliable solution for early-stage AD diagnosis.

IJCAI Conference 2025 Conference Paper

High-Confident Local Structure Guided Consensus Graph Learning For Incomplete Multi-view Clustering

  • Shuping Zhao
  • Lunke Fei
  • Qi Lai
  • Jie Wen
  • Jinrong Cui
  • Tingting Chai

Current existing clustering methods for handling incomplete multi-view data primarily concentrate on learning a common representation or graph from the available views, while overlooking the latent information contained in the missing views and the imbalance of information among different views. Furthermore, instances with weak discriminative features usually degrading the precision of consistent representation or graph across all views. To address these problems, in this paper, we propose a simple but efficient method, called high-confident local structure guided consensus graph learning for incomplete multi-view clustering (HLSCG_IMC). Specifically, this method can adaptively learn a strict block diagonal structure from the available samples using a block diagonal representation regularizer. Different from the existing methods using a simple pairwise affinity graph for structure construction, we consider the influence of instances located at the edge of two clusters on the construction of graph for each view. By harnessing the proposed high-confident strict block diagonal structures, the approach seeks to directly guide the learning of the robust consensus graph. A number of experiments have been conducted to verify the efficacy of our approach.

YNIMG Journal 2025 Journal Article

Impaired glymphatic transport in hypoxic-ischemic encephalopathy

  • Jieyi Shen
  • Ying Yang
  • Fangfang Chen
  • Yang Zuo
  • Yidong Yang
  • Wei Wei
  • Ying Liu
  • Jie Wen

Hypoxic-ischemic encephalopathy (HIE) is a major cause of neonatal brain injury. The glymphatic system aids in waste clearance via perivascular pathways and is crucial in maintaining brain functions. While studies have shown that diseases such as stroke and traumatic brain injury disrupt glymphatic function, the impact of HIE on this system remains largely unexplored. We utilized an HIE mouse model with dynamic contrast-enhanced MRI (DCE-MRI) to conduct both qualitative and quantitative assessment of glymphatic transports dysfunction in different brain regions. Fluorescent cerebrospinal fluid (CSF) tracers were used to investigate the effects of HIE on glymphatic system development. Mice brain sections were subjected to Aquaporin-4 (AQP4) immunohistochemical staining, allowing for detailed morphological assessment of AQP4 polarization in affected brain regions. HIE mice exhibited delayed glymphatic transport dynamics, with prolonged time-to-peak tracer enhancement and increased retention in olfactory bulb, basal forebrain, and hypothalamus regions. Quantitative kinetic analysis showed significant reductions in Kf (CSF-to-perivascular space transfer constants) and Ks (perivascular-to-parenchyma transfer constants), alongside elevated Vf (perivascular volume fractions) across cortical and subcortical structures. Fluorescent CSF tracer analysis indicates that HIE impaired glymphatic system maturation in neonatal mice. This impairment progressed to persistent glymphatic dysfunction. Histologically validated via immunofluorescence, HIE-induced astrocytic AQP4 mis-polarization directly correlates with glymphatic transport dysfunction, underscoring AQP4′s critical role in glymphatic system integrity. Our multimodal imaging study combining DCE-MRI and CSF tracer analysis indicates that HIE can cause regional impairments of glymphatic function and adversely affect brain development.

NeurIPS Conference 2025 Conference Paper

Learning from Disjoint Views: A Contrastive Prototype Matching Network for Fully Incomplete Multi-View Clustering

  • Yiming Wang
  • Qun Li
  • Dongxia Chang
  • Jie Wen
  • Hua Dai
  • Fu Xiao
  • Yao Zhao

Multi-view clustering aims to enhance clustering performance by leveraging information from diverse sources. However, its practical application is often hindered by a barrier: the lack of correspondences across views. This paper focuses on the understudied problem of fully incomplete multi-view clustering (FIMC), a scenario where existing methods fail due to their reliance on partial alignment. To address this problem, we introduce the Contrastive Prototype Matching Network (CPMN), a novel framework that establishes a new paradigm for cross-view alignment based on matching high-level categorical structures. Instead of aligning individual instances, CPMN performs a more robust cluster prototype alignment. CPMN first employs a correspondence-free graph contrastive learning approach, leveraging mutual $k$-nearest neighbors (MNN) to uncover intrinsic data structures and establish initial prototypes from entirely unpaired views. Building on the prototypes, we introduce a cross-view prototype graph matching stage to resolve category misalignment and forge a unified clustering structure. Finally, guided by this alignment, we devise a prototype-aware contrastive learning mechanism to promote semantic consistency, replacing the reliance on the initial MNN-based structural similarity. Extensive experiments on benchmark datasets demonstrate that our method significantly outperforms various baselines and ablation variants, validating its effectiveness.

AAAI Conference 2025 Conference Paper

Lightweight Contrastive Distilled Hashing for Online Cross-modal Retrieval

  • Jiaxing Li
  • Lin Jiang
  • Zeqi Ma
  • Kaihang Jiang
  • Xiaozhao Fang
  • Jie Wen

Deep online cross-modal hashing has gained much attention from researchers recently, as its promising applications with low storage requirement, fast retrieval efficiency and cross modality adaptive, etc. However, there still exists some technical hurdles that hinder its applications, e.g., 1) how to extract the coexistent semantic relevance of cross-modal data, 2) how to achieve competitive performance when handling the real time data streams, 3) how to transfer the knowledge learned from offline to online training in a lightweight manner. To address these problems, this paper proposes a lightweight contrastive distilled hashing (LCDH) for cross-modal retrieval, by innovatively bridging the offline and online cross-modal hashing by similarity matrix approximation in a knowledge distillation framework. Specifically, in the teacher network, LCDH first extracts the cross-modal features by CLIP, which are further fed into an attention module for representation enhancement after feature fusion. Then, the output of the attention module is fed into a FC layer to obtain hash codes for aligning the sizes of similarity matrices for online and offline training. In the student network, LCDH extracts the visual and textual features by lightweight models, and then the features are fed into a FC layer to generate binary codes. Finally, by approximating the similarity matrices, the performance of online hashing in the lightweight student network can be enhanced by the supervision of coexistent semantic relevance that is distilled from the teacher network. Experimental results on three widely used datasets demonstrate that LCDH outperforms some state-of-the-art methods.

AAAI Conference 2025 Conference Paper

Multi-view Evidential Learning-based Medical Image Segmentation

  • Chao Huang
  • Yushu Shi
  • Waikeung Wong
  • Chengliang Liu
  • Wei Wang
  • Zhihua Wang
  • Jie Wen

Medical image segmentation provides useful information about the shape and size of organs, which is beneficial for improving diagnosis, analysis, and treatment. Despite traditional deep learning-based models can extract domain-specific knowledge, they face a generalization bottleneck due to the limited embedded knowledge scope. Vision foundation models have been demonstrated to be effective in extracting generalizable knowledge, but they cannot extract domain-specific knowledge without fine-tuning. In this work, we propose a novel multi-view evidential learning-based framework, which can extract both domain-specific and generalizable knowledge from multi-view features by combining the advantages of traditional and vision foundation models. Specifically, a novel multi-view state space model (MV-SSM) is designed to extract task-related knowledge while removing redundant information within multi-view features. The proposed MV-SSM utilizes Mamba, a state space model, to model cross-view contextual dependencies between domain-specific and generalizable features. Additionally, evidential learning is adopted to quantify the segmentation uncertainty of the model for boundary. In special, variational Dirichlet is introduced to characterize the distribution of the result probabilities, parameterized with collected evidence to quantify uncertainty. As a result, the model can reduce the segmentation uncertainties of boundaries by optimizing the parameters of the Dirichlet distribution. Experimental results on three datasets show that our method obtains superior segmentation performance.

NeurIPS Conference 2025 Conference Paper

NeuroH-TGL: Neuro-Heterogeneity Guided Temporal Graph Learning Strategy for Brain Disease Diagnosis

  • Shengrong Li
  • Qi Zhu
  • Chunwei Tian
  • Xinyang Zhang
  • WEI SHAO
  • Jie Wen
  • Daoqiang Zhang

Dynamic functional brain networks (DFBNs) are powerful tools in neuroscience research. Recent studies reveal that DFBNs contain heterogeneous neural nodes with more extensive connections and more drastic temporal changes, which play pivotal roles in coordinating the reorganization of the brain. Moreover, the spatio-temporal patterns of these nodes are modulated by the brain's historical states. However, existing methods not only ignore the spatio-temporal heterogeneity of neural nodes, but also fail to effectively encode the temporal propagation mechanism of heterogeneous activities. These limitations hinder the deep exploration of spatio-temporal relationships within DFBNs, preventing the capture of abnormal neural heterogeneity caused by brain diseases. To address these challenges, this paper propose a neuro-heterogeneity guided temporal graph learning strategy (NeuroH-TGL). Specifically, we first develop a spatio-temporal pattern decoupling module to disentangle DFBNs into topological consistency networks and temporal trend networks that align with the brain's operational mechanisms. Then, we introduce a heterogeneity mining module to identify pivotal heterogeneity nodes that drive brain reorganization from the two decoupled networks. Finally, we design temporal propagation graph convolution to simulate the influence of the historical states of heterogeneity nodes on the current topology, thereby flexibly extracting heterogeneous spatio-temporal information from the brain. Experiments show that our method surpasses several state-of-the-art methods, and can identify abnormal heterogeneous nodes caused by brain diseases.

IJCAI Conference 2025 Conference Paper

Omni-Dimensional State Space Model-driven SAM for Pixel-level Anomaly Detection

  • Chao Huang
  • Qianyi Li
  • Jie Wen
  • Bob Zhang

Pixel-level anomaly detection is indispensable in industrial defect detection and medical diagnosis. Recently, Segment Anything Model (SAM) has achieved promising results in many vision tasks. However, direct application of the SAM to pixel-level anomaly detection tasks results in unsatisfactory performance, meanwhile SAM needs the manual prompt. Although some automatically prompt-based SAM has been proposed, these automated prompting approaches merely utilize partial image features as prompts and fail to incorporate crucial features such as multi-scale image features to generate more suitable prompts. In this paper, we propose a novel Omni Dimensional State Space Model-driven SAM (ODS-SAM) for pixel-level anomaly detection. Specifically, the proposed method adopts the SAM architecture, ensuring easy implementation and avoiding the need for fine-tuning. A State-Space Model-based residual Omni Dimensional module is designed to automatically generate suitable prompts. This module can effectively leverage multi-scale and global information, facilitating an iterative search for optimal prompts in the prompt space. The identified optimal prompts are then fed into SAM as high-dimensional tensors. Experimental results demonstrate that the proposed ODS-SAM outperforms state-of-the-art models on both industrial and medical image datasets.

IJCAI Conference 2025 Conference Paper

Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly Detection

  • Weichao Cai
  • Weiliang Huang
  • Yunkang Cao
  • Chao Huang
  • Fei Yuan
  • Bob Zhang
  • Jie Wen

Zero-Shot Industrial Anomaly Detection (ZSIAD) aims to identify and localize anomalies in industrial images from unseen categories. Owing to the powerful generalization capabilities, Vision-Language Models (VLMs) have achieved growing interest in ZSIAD. To guide the model toward understanding and localizing the semantically complex industrial anomalies, existing VLM-based methods have attempted to provide additional prompts to the model through learnable text prompt templates. However, these zero-shot methods lack detailed descriptions of specific anomalies, making it difficult to classify and segment the diverse range of industrial anomalies accurately. To address the aforementioned issue, we firstly propose the multi-stage prompt generation agent for ZSIAD. Specifically, we leverage the Multi-modal Language Large Model (MLLM) to articulate the detailed differential information between normal and test samples, which can provide detailed text prompts to the model through further refinement and anti-false alarm constraint. Moreover, we introduce the Visual Fundamental Model (VFM) to generate anomaly-related attention prompts for more accurate localization of anomalies with varying sizes and shapes. Extensive experiments on seven real-world industrial anomaly detection datasets have shown that the proposed method not only outperforms recent SOTA methods, but also its explainable prompts provide the model with a more intuitive basis for anomaly identification.

NeurIPS Conference 2025 Conference Paper

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

  • Chao Huang
  • Benfeng Wang
  • Wei Wang
  • Jie Wen
  • Chengliang Liu
  • Li Shen
  • Xiaochun Cao

Recent advancements in reasoning capability of Multimodal Large Language Models (MLLMs) demonstrate its effectiveness in tackling complex visual tasks. However, existing MLLM-based Video Anomaly Detection (VAD) methods remain limited to shallow anomaly descriptions without deep reasoning. In this paper, we propose a new task named Video Anomaly Reasoning (VAR), which aims to enable deep analysis and understanding of anomalies in the video by requiring MLLMs to think explicitly before answering. To this end, we propose Vad-R1, an end-to-end MLLM-based framework for VAR. Specifically, we design a Perception-to-Cognition Chain-of-Thought (P2C-CoT) that simulates the human process of recognizing anomalies, guiding the MLLMs to reason about anomalies step-by-step. Based on the structured P2C-CoT, we construct Vad-Reasoning, a dedicated dataset for VAR. Furthermore, we propose an improved reinforcement learning algorithm AVA-GRPO, which explicitly incentivizes the anomaly reasoning capability of MLLMs through a self-verification mechanism with limited annotations. Experimental results demonstrate that Vad-R1 achieves superior performance, outperforming both open-source and proprietary models on VAD and VAR tasks.

AAAI Conference 2024 Conference Paper

A Two-Stage Information Extraction Network for Incomplete Multi-View Multi-Label Classification

  • Xin Tan
  • Ce Zhao
  • Chengliang Liu
  • Jie Wen
  • Zhanyan Tang

Recently, multi-view multi-label classification (MvMLC) has received a significant amount of research interest and many methods have been proposed based on the assumptions of view completion and label completion. However, in real-world scenarios, multi-view multi-label data tends to be incomplete due to various uncertainties involved in data collection and manual annotation. As a result, the conventional MvMLC methods fail. In this paper, we propose a new two-stage MvMLC network to solve this incomplete MvMLC issue with partial missing views and missing labels. Different from the existing works, our method attempts to leverage the diverse information from the partially missing data based on the information theory. Specifically, our method aims to minimize task-irrelevant information while maximizing task-relevant information through the principles of information bottleneck theory and mutual information extraction. The first stage of our network involves training view-specific classifiers to concentrate the task-relevant information. Subsequently, in the second stage, the hidden states of these classifiers serve as input for an alignment model, an autoencoder-based mutual information extraction framework, and a weighted fusion classifier to make the final prediction. Extensive experiments performed on five datasets validate that our method outperforms other state-of-the-art methods. Code is available at https://github.com/KevinTan10/TSIEN.

AAAI Conference 2024 Conference Paper

Attention-Induced Embedding Imputation for Incomplete Multi-View Partial Multi-Label Classification

  • Chengliang Liu
  • Jinlong Jia
  • Jie Wen
  • Yabo Liu
  • Xiaoling Luo
  • Chao Huang
  • Yong Xu

As a combination of emerging multi-view learning methods and traditional multi-label classification tasks, multi-view multi-label classification has shown broad application prospects. The diverse semantic information contained in heterogeneous data effectively enables the further development of multi-label classification. However, the widespread incompleteness problem on multi-view features and labels greatly hinders the practical application of multi-view multi-label classification. Therefore, in this paper, we propose an attention-induced missing instances imputation technique to enhance the generalization ability of the model. Different from existing incomplete multi-view completion methods, we attempt to approximate the latent features of missing instances in embedding space according to cross-view joint attention, instead of recovering missing views in kernel space or original feature space. Accordingly, multi-view completed features are dynamically weighted by the confidence derived from joint attention in the late fusion phase. In addition, we propose a multi-view multi-label classification framework based on label-semantic feature learning, utilizing the statistical weak label correlation matrix and graph attention network to guide the learning process of label-specific features. Finally, our model is compatible with missing multi-view and partial multi-label data simultaneously and extensive experiments on five datasets confirm the advancement and effectiveness of our embedding imputation method and multi-view multi-label classification model.

AAAI Conference 2024 Conference Paper

Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures

  • Gehui Xu
  • Jie Wen
  • Chengliang Liu
  • Bing Hu
  • Yicheng Liu
  • Lunke Fei
  • Wei Wang

Incomplete multi-view clustering (IMVC) aims to reveal shared clustering structures within multi-view data, where only partial views of the samples are available. Existing IMVC methods primarily suffer from two issues: 1) Imputation-based methods inevitably introduce inaccurate imputations, which in turn degrade clustering performance; 2) Imputation-free methods are susceptible to unbalanced information among views and fail to fully exploit shared information. To address these issues, we propose a novel method based on variational autoencoders. Specifically, we adopt multiple view-specific encoders to extract information from each view and utilize the Product-of-Experts approach to efficiently aggregate information to obtain the common representation. To enhance the shared information in the common representation, we introduce a coherence objective to mitigate the influence of information imbalance. By incorporating the Mixture-of-Gaussians prior information into the latent representation, our proposed method is able to learn the common representation with clustering-friendly structures. Extensive experiments on four datasets show that our method achieves competitive clustering performance compared with state-of-the-art methods.

AAAI Conference 2024 Conference Paper

HACDR-Net: Heterogeneous-Aware Convolutional Network for Diabetic Retinopathy Multi-Lesion Segmentation

  • Qihao Xu
  • Xiaoling Luo
  • Chao Huang
  • Chengliang Liu
  • Jie Wen
  • Jialei Wang
  • Yong Xu

Diabetic Retinopathy (DR), the leading cause of blindness in diabetic patients, is diagnosed by the condition of retinal multiple lesions. As a difficult task in medical image segmentation, DR multi-lesion segmentation faces the main concerns as follows. On the one hand, retinal lesions vary in location, shape, and size. On the other hand, because some lesions occupy only a very small part of the entire fundus image, the high proportion of background leads to difficulties in lesion segmentation. To solve the above problems, we propose a heterogeneous-aware convolutional network (HACDR-Net) that composes heterogeneous cross-convolution, heterogeneous modulated deformable convolution, and optional near-far-aware convolution. Our network introduces an adaptive aggregation module to summarize the heterogeneous feature maps and get diverse lesion areas in the heterogeneous receptive field along the channels and space. In addition, to solve the problem of the highly imbalanced proportion of focal areas, we design a new medical image segmentation loss function, Noise Adjusted Loss (NALoss). NALoss balances the predictive feature distribution of background and lesion by jointing Gaussian noise and hard example mining, thus enhancing awareness of lesions. We conduct the experiments on the public datasets IDRiD and DDR, and the experimental results show that the proposed method achieves better performance than other state-of-the-art methods. The code is open-sourced on github.com/xqh180110910537/HACDR-Net.

IJCAI Conference 2024 Conference Paper

Long Short-Term Dynamic Prototype Alignment Learning for Video Anomaly Detection

  • Chao Huang
  • Jie Wen
  • Chengliang Liu
  • Yabo Liu

Video anomaly detection (VAD) is the core problem of intelligent video surveillance. Previous methods commonly adopt the unsupervised paradigm of frame reconstruction or prediction. However, the lack of mining of temporal dependent relationships and diversified event patterns within videos limit the performance of existing methods. To tackle these problems, we propose a novel prototype-guided and dynamic-aware long-distance frame prediction paradigm for VAD. Specifically, we develop a prototype-guided dynamics matching network (PDM-Net) to enhance the discriminant and robustness of anomaly detector. To explore the temporal contexts, we equip PDM-Net with a long short-term dynamic prototype alignment learning mechanism, which stores long-term dynamic prototypes into memory bank and learns how to recall long-term dynamic prototypes with short-term dynamics. As a result, the short input sequences can recall long-term dynamic prototypes stored in the memory bank to achieve the task of long-distance frame prediction. Besides, a feature discrimination module is adopted to extract the representative dynamic features of various normal events meanwhile preserving the diversity of normal patterns. Experimental results on four public datasets demonstrate the superiority of our method.

NeurIPS Conference 2024 Conference Paper

Optimal Transport-based Labor-free Text Prompt Modeling for Sketch Re-identification

  • Rui Li
  • Tingting Ren
  • Jie Wen
  • Jinxing Li

Sketch Re-identification (Sketch Re-ID), which aims to retrieve target person from an image gallery based on a sketch query, is crucial for criminal investigation, law enforcement, and missing person searches. Existing methods aim to alleviate the modality gap by employing semantic metrics constraints or auxiliary modal guidance. However, they incur expensive labor costs and inevitably omit fine-grained modality-consistent information due to the abstraction of sketches. To address this issue, this paper proposes a novel $\textit{Optimal Transport-based Labor-free Text Prompt Modeling}$ (OLTM) network, which hierarchically extracts coarse- and fine-grained similarity representations guided by textual semantic information without any additional annotations. Specifically, multiple target attributes are flexibly obtained by a pre-trained visual question answering (VQA) model. Subsequently, a text prompt reasoning module employs learnable prompt strategy and optimal transport algorithm to extract discriminative global and local text representations, which serve as a bridge for hierarchical and multi-granularity modal alignment between sketch and image modalities. Additionally, instead of measuring the similarity of two samples by only computing their distance, a novel triplet assignment loss is further proposed, in which the whole data distribution also contributes to optimizing the inter/intra-class distances. Extensive experiments conducted on two public benchmarks consistently demonstrate the robustness and superiority of our OLTM over state-of-the-art methods.

AAAI Conference 2023 Conference Paper

DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

  • Chengliang Liu
  • Jie Wen
  • Xiaoling Luo
  • Chao Huang
  • Zhihao Wu
  • Yong Xu

In recent years, multi-view multi-label learning has aroused extensive research enthusiasm. However, multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation, which means that not only multi-view features are often missing, and label completeness is also difficult to be satisfied. To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet. Different from conventional methods, our DICNet focuses on leveraging deep neural network to exploit the high-level semantic representations of samples rather than shallow-level features. First, we utilize the stacked autoencoders to build an end-to-end multi-view feature extraction framework to learn the view-specific representations of samples. Furthermore, in order to improve the consensus representation ability, we introduce an incomplete instance-level contrastive learning scheme to guide the encoders to better extract the consensus information of multiple views and use a multi-view weighted fusion module to enhance the discrimination of semantic features. Overall, our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels. Extensive experiments performed on five datasets validate that our method outperforms other state-of-the-art methods.

YNIMG Journal 2023 Journal Article

In vivo labeling and quantitative imaging of neuronal populations using MRI

  • Shana Li
  • Xiang Xu
  • Canjun Li
  • Ziyan Xu
  • Ke Wu
  • Qiong Ye
  • Yan Zhang
  • Xiaohua Jiang

The study of neural circuits, which underlies perception, cognition, emotion, and behavior, is essential for understanding the mammalian brain, a complex organ consisting of billions of neurons. To study the structure and function of the brain, in vivo neuronal labeling and imaging techniques are crucial as they provide true physiological information that ex vivo methods cannot offer. In this paper, we present a new strategy for in vivo neuronal labeling and quantification using MRI. We demonstrate the efficacy of this method by delivering the oatp1a1 gene to the target neurons using rAAV2-retro virus. OATP1A1 protein expression on the neuronal membrane increased the uptake of a specific MRI contrast agent (Gd-EOB-DTPA), leading to hyperintense signals on T1W images of labeled neuronal populations. We also used dynamic contrast enhancement-based methods to obtain quantitative information on labeled neuronal populations in vivo.

AAAI Conference 2023 Conference Paper

Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

  • Chengliang Liu
  • Jie Wen
  • Xiaoling Luo
  • Yong Xu

As we all know, multi-view data is more expressive than single-view data and multi-label annotation enjoys richer supervision information than single-label, which makes multi-view multi-label learning widely applicable for various pattern recognition tasks. In this complex representation learning problem, three main challenges can be characterized as follows: i) How to learn consistent representations of samples across all views? ii) How to exploit and utilize category correlations of multi-label to guide inference? iii) How to avoid the negative impact resulting from the incompleteness of views or labels? To cope with these problems, we propose a general multi-view multi-label learning framework named label-guided masked view- and category-aware transformers in this paper. First, we design two transformer-style based modules for cross-view features aggregation and multi-label classification, respectively. The former aggregates information from different views in the process of extracting view-specific features, and the latter learns subcategory embedding to improve classification performance. Second, considering the imbalance of expressive power among views, an adaptively weighted view fusion module is proposed to obtain view-consistent embedding features. Third, we impose a label manifold constraint in sample-level representation learning to maximize the utilization of supervised information. Last but not least, all the modules are designed under the premise of incomplete views and labels, which makes our method adaptable to arbitrary multi-view and multi-label data. Extensive experiments on five datasets confirm that our method has clear advantages over other state-of-the-art methods.

NeurIPS Conference 2023 Conference Paper

Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

  • Chengliang Liu
  • Jie Wen
  • Yabo Liu
  • Chao Huang
  • Zhihao Wu
  • Xiaoling Luo
  • Yong Xu

Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem. The core innovation of our method lies in decoupling the single-channel view-level representation, which is common in deep multi-view learning methods, into a shared representation and a view-proprietary representation. We also design a cross-channel contrastive loss to enhance the semantic property of the two channels. Additionally, we exploit supervised information to design a label-guided graph regularization loss, helping the extracted embedding features preserve the geometric structure among samples. Inspired by the success of masking mechanisms in image and text analysis, we develop a random fragment masking strategy for vector features to improve the learning ability of encoders. Finally, it is important to emphasize that our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data. We have conducted sufficient and convincing experiments to confirm the effectiveness and advancement of our model.

AAAI Conference 2023 Conference Paper

MVCINN: Multi-View Diabetic Retinopathy Detection Using a Deep Cross-Interaction Neural Network

  • Xiaoling Luo
  • Chengliang Liu
  • Waikeung Wong
  • Jie Wen
  • Xiaopeng Jin
  • Yong Xu

Diabetic retinopathy (DR) is the main cause of irreversible blindness for working-age adults. The previous models for DR detection have difficulties in clinical application. The main reason is that most of the previous methods only use single-view data, and the single field of view (FOV) only accounts for about 13% of the FOV of the retina, resulting in the loss of most lesion features. To alleviate this problem, we propose a multi-view model for DR detection, which takes full advantage of multi-view images covering almost all of the retinal field. To be specific, we design a Cross-Interaction Self-Attention based Module (CISAM) that interfuses local features extracted from convolutional blocks with long-range global features learned from transformer blocks. Furthermore, considering the pathological association in different views, we use the feature jigsaw to assemble and learn the features of multiple views. Extensive experiments on the latest public multi-view MFIDDR dataset with 34,452 images demonstrate the superiority of our method, which performs favorably against state-of-the-art models. To the best of our knowledge, this work is the first study on the public large-scale multi-view fundus images dataset for DR detection.

AAAI Conference 2023 Conference Paper

Tensorized Incomplete Multi-View Clustering with Intrinsic Graph Completion

  • Shuping Zhao
  • Jie Wen
  • Lunke Fei
  • Bob Zhang

Most of the existing incomplete multi-view clustering (IMVC) methods focus on attaining a consensus representation from different views but ignore the important information hidden in the missing views and the latent intrinsic structures in each view. To tackle these issues, in this paper, a unified and novel framework, named tensorized incomplete multi-view clustering with intrinsic graph completion (TIMVC_IGC) is proposed. Firstly, owing to the effectiveness of the low-rank representation in revealing the inherent structure of the data, we exploit it to infer the missing instances and construct the complete graph for each view. Afterwards, inspired by the structural consistency, a between-view consistency constraint is imposed to guarantee the similarity of the graphs from different views. More importantly, the TIMVC_IGC simultaneously learns the low-rank structures of the different views and explores the correlations of the different graphs in a latent manifold sub-space using a low-rank tensor constraint, such that the intrinsic graphs of the different views can be obtained. Finally, a consensus representation for each sample is gained with a co-regularization term for final clustering. Experimental results on several real-world databases illustrates that the proposed method can outperform the other state-of-the-art related methods for incomplete multi-view clustering.

YNIMG Journal 2021 Journal Article

In vivo evaluation of heme and non-heme iron content and neuronal density in human basal ganglia

  • Dmitriy A Yablonskiy
  • Jie Wen
  • Satya V.V.N. Kothapalli
  • Alexander L Sukstanskii

Non-heme iron is an important element supporting the structure and functioning of biological tissues. Imbalance in non-heme iron can lead to different neurological disorders. Several MRI approaches have been developed for iron quantification relying either on the relaxation properties of MRI signal or measuring tissue magnetic susceptibility. Specific quantification of the non-heme iron can, however, be constrained by the presence of the heme iron in the deoxygenated blood and contribution of cellular composition. The goal of this paper is to introduce theoretical background and experimental MRI method allowing disentangling contributions of heme and non-heme irons simultaneously with evaluation of tissue neuronal density in the iron-rich basal ganglia. Our approach is based on the quantitative Gradient Recalled Echo (qGRE) MRI technique that allows separation of the total R 2 * metric characterizing decay of GRE signal into tissue-specific ( R 2 t * ) and the baseline blood oxygen level-dependent (BOLD) contributions. A combination with the QSM data (also available from the qGRE signal phase) allowed further separation of the tissue-specific R 2 t * metric in a cell-specific and non-heme-iron-specific contributions. It is shown that the non-heme iron contribution to R 2 t * relaxation can be described with the previously developed Gaussian Phase Approximation (GPA) approach. qGRE data were obtained from 22 healthy control participants (ages 26–63 years). Results suggest that the ferritin complexes are aggregated in clusters with an average radius about 100 nm comprising approximately 2600 individual ferritin units. It is also demonstrated that the concentrations of heme and non-heme iron tend to increase with age. The strongest age effect was seen in the pallidum region, where the highest age-related non-heme iron accumulation was observed.

AAAI Conference 2021 Conference Paper

Unified Tensor Framework for Incomplete Multi-view Clustering and Missing-view Inferring

  • Jie Wen
  • Zheng Zhang
  • Zhao Zhang
  • Lei Zhu
  • Lunke Fei
  • Bob Zhang
  • Yong Xu

In this paper, we propose a novel method, referred to as incomplete multi-view tensor spectral clustering with missingview inferring (IMVTSC-MVI) to address the challenging multi-view clustering problem with missing views. Different from the existing methods which commonly focus on exploring the certain information of the available views while ignoring both of the hidden information of the missing views and the intra-view information of data, IMVTSC-MVI seeks to recover the missing views and explore the full information of such recovered views and available views for data clustering. In particular, IMVTSC-MVI incorporates the feature space based missing-view inferring and manifold space based similarity graph learning into a unified framework. In such a way, IMVTSC-MVI allows these two learning tasks to facilitate each other and can well explore the hidden information of the missing views. Moreover, IMVTSC-MVI introduces the low-rank tensor constraint to capture the high-order correlations of multiple views. Experimental results on several datasets demonstrate the effectiveness of IMVTSC-MVI for incomplete multi-view clustering.

IJCAI Conference 2020 Conference Paper

CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

  • Jie Wen
  • Zheng Zhang
  • Yong Xu
  • Bob Zhang
  • Lunke Fei
  • Guo-Sen Xie

In recent years, incomplete multi-view clustering, which studies the challenging multi-view clustering problem on missing views, has received growing research interests. Although a series of methods have been proposed to address this issue, the following problems still exist: 1) Almost all of the existing methods are based on shallow models, which is difficult to obtain discriminative common representations. 2) These methods are generally sensitive to noise or outliers since the negative samples are treated equally as the important samples. In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues. Specifically, it captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework. Moreover, based on the human cognition, \emph{i. e. }, learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training, which can reduce the negative influence of outliers. Experimental results on several incomplete datasets show that CDIMC-net outperforms the state-of-the-art incomplete multi-view clustering methods.

AAAI Conference 2019 Conference Paper

Unified Embedding Alignment with Missing Views Inferring for Incomplete Multi-View Clustering

  • Jie Wen
  • Zheng Zhang
  • Yong Xu
  • Bob Zhang
  • Lunke Fei
  • Hong Liu

Multi-view clustering aims to partition data collected from diverse sources based on the assumption that all views are complete. However, such prior assumption is hardly satisfied in many real-world applications, resulting in the incomplete multi-view learning problem. The existing attempts on this problem still have the following limitations: 1) the underlying semantic information of the missing views is commonly ignored; 2) The local structure of data is not well explored; 3) The importance of different views is not effectively evaluated. To address these issues, this paper proposes a Unified Embedding Alignment Framework (UEAF) for robust incomplete multi-view clustering. In particular, a locality-preserved reconstruction term is introduced to infer the missing views such that all views can be naturally aligned. A consensus graph is adaptively learned and embedded via the reverse graph regularization to guarantee the common local structure of multiple views and in turn can further align the incomplete views and inferred views. Moreover, an adaptive weighting strategy is designed to capture the importance of different views. Extensive experimental results show that the proposed method can significantly improve the clustering performance in comparison with some state-of-the-art methods.

EAAI Journal 2018 Journal Article

An interactively constrained discriminative dictionary learning algorithm for image classification

  • Zhengming Li
  • Zheng Zhang
  • Zizhu Fan
  • Jie Wen

Researches demonstrate that profiles (row vectors of coding coefficient matrix) can be used to select and update atoms. However, the profiles are seldom used to construct discriminative terms in dictionary learning. In this paper, we propose an interactively constrained discriminative dictionary learning (IC-DDL) algorithm for image classification. First, we give a Lemma of the relation between the profiles and atoms. That is, similar profiles can lead to the corresponding atoms which are also similar, and vice verse. Then, we construct a profile constrained term by using the profiles and Laplacian graph of the atoms. Third, we explore the atoms and the Laplacian graph of the profiles to construct an atom constrained term. By alternatively and interactively updating the profiles and atoms, the two proposed constrained terms not only can inherit the structure information of the training samples, but also preserve the structure information of the atoms and profiles simultaneously. Moreover, the atom constrained model also can minimize the incoherence of the atoms. Experiment results demonstrate that the IC-DDL algorithm can achieve better performance than some state-of-the-art dictionary learning algorithms on the six image databases.

YNIMG Journal 2017 Journal Article

In vivo detection of microstructural correlates of brain pathology in preclinical and early Alzheimer Disease with magnetic resonance imaging

  • Yue Zhao
  • Marcus E. Raichle
  • Jie Wen
  • Tammie L. Benzinger
  • Anne M. Fagan
  • Jason Hassenstab
  • Andrei G. Vlassenko
  • Jie Luo

Background Alzheimer disease (AD) affects at least 5 million individuals in the USA alone stimulating an intense search for disease prevention and treatment therapies as well as for diagnostic techniques allowing early identification of AD during a long pre-symptomatic period that can be used for the initiation of prevention trials of disease-modifying therapies in asymptomatic individuals. Methods Our approach to developing such techniques is based on the Gradient Echo Plural Contrast Imaging (GEPCI) technique that provides quantitative in vivo measurements of several brain-tissue-specific characteristics of the gradient echo MRI signal (GEPCI metrics) that depend on the integrity of brain tissue cellular structure. Preliminary data were obtained from 34 participants selected from the studies of aging and dementia at the Knight Alzheimer's Disease Research Center at Washington University in St. Louis. Cognitive status was operationalized with the Clinical Dementia Rating (CDR) scale. The participants, assessed as cognitively normal (CDR=0; n=23) or with mild AD dementia (CDR=0. 5 or 1; n=11) underwent GEPCI MRI, a collection of cognitive performance tests and CSF amyloid (Aβ) biomarker Aβ42. A subset of 19 participants also underwent PET PiB studies to assess their brain Aβ burden. According to the Aβ status, cognitively normal participants were divided into normal (Aβ negative; n=13) and preclinical (Aβ positive; n=10) groups. Results GEPCI quantitative measurements demonstrated significant differences between all the groups: normal and preclinical, normal and mild AD, and preclinical and mild AD. GEPCI quantitative metrics characterizing tissue cellular integrity in the hippocampus demonstrated much stronger correlations with psychometric tests than the hippocampal atrophy. Importantly, GEPCI-determined changes in the hippocampal tissue cellular integrity were detected even in the hippocampal areas not affected by the atrophy. Our studies also uncovered strong correlations between GEPCI brain tissue metrics and beta-amyloid (Aβ) burden defined by positron emission tomography (PET) – the current in vivo gold standard for detection of cortical Aβ, thus supporting GEPCI as a potential surrogate marker for Aβ imaging – a known biomarker of early AD. Remarkably, the data show significant correlations not only in the areas of high Aβ accumulation (e. g. precuneus) but also in some areas of medial temporal lobe (e. g. parahippocampal cortex), where Aβ accumulation is relatively low. Conclusion We have demonstrated that GEPCI provides a new approach for the in vivo evaluation of AD-related tissue pathology in the preclinical and early symptomatic stages of AD. Since MRI is a widely available technology, the GEPCI surrogate markers of AD pathology have a potential for improving the quality of AD diagnostic, and the evaluation of new disease-modifying therapies.

YNIMG Journal 2016 Journal Article

On the relationship between cellular and hemodynamic properties of the human brain cortex throughout adult lifespan

  • Yue Zhao
  • Jie Wen
  • Anne H. Cross
  • Dmitriy A. Yablonskiy

Establishing baseline MRI biomarkers for normal brain aging is significant and valuable for separating normal changes in the brain structure and function from different neurological diseases. In this paper for the first time we have simultaneously measured a variety of tissue specific contributions defining R2* relaxation of the gradient recalled echo (GRE) MRI signal in human brains of healthy adults (ages 22 to 74years) and related these measurements to tissue structural and functional properties. This was accomplished by separating tissue (R2 t ⁎) and extravascular BOLD contributions to the total tissue specific GRE MRI signal decay (R2⁎) using an advanced version of previously developed Gradient Echo Plural Contrast Imaging (GEPCI) approach and the acquisition and post-processing methods that allowed the minimization of artifacts related to macroscopic magnetic field inhomogeneities, and physiological fluctuations. Our data (20 healthy subjects) show that in most cortical regions R2 t ⁎ increases with age while tissue hemodynamic parameters, i. e. relative oxygen extraction fraction (OEFrel), deoxygenated cerebral blood volume (dCBV) and tissue concentration of deoxyhemoglobin (Cdeoxy) remain practically constant. We also found the important correlations characterizing the relationships between brain structural and hemodynamic properties in different brain regions. Specifically, thicker cortical regions have lower R2 t ⁎ and these regions have lower OEF. The comparison between GEPCI-derived tissue specific structural and functional metrics and literature information suggests that (a) regions in a brain characterized by higher R2 t ⁎ contain higher concentration of neurons with less developed cellular processes (dendrites, spines, etc.), (b) regions in a brain characterized by lower R2 t ⁎ represent regions with lower concentration of neurons but more developed cellular processes, and (c) the age-related increases in the cortical R2 t ⁎ mostly reflect the age-related increases in the cellular packing density. The baseline GEPCI-based biomarkers obtain herein could serve to help distinguish age-related changes in brain cellular and hemodynamic properties from changes which occur due to the neurodegenerative diseases.

YNICL Journal 2015 Journal Article

Detection and quantification of regional cortical gray matter damage in multiple sclerosis utilizing gradient echo MRI

  • Jie Wen
  • Dmitriy A. Yablonskiy
  • Jie Luo
  • Samantha Lancia
  • Charles Hildebolt
  • Anne H. Cross

Cortical gray matter (GM) damage is now widely recognized in multiple sclerosis (MS). The standard MRI does not reliably detect cortical GM lesions, although cortical volume loss can be measured. In this study, we demonstrate that the gradient echo MRI can reliably and quantitatively assess cortical GM damage in MS patients using standard clinical scanners. High resolution multi-gradient echo MRI was used for regional mapping of tissue-specific MRI signal transverse relaxation rate values (R2(*)) in 10 each relapsing-remitting, primary-progressive and secondary-progressive MS subjects. A voxel spread function method was used to correct artifacts induced by background field gradients. R2(*) values from healthy controls (HCs) of varying ages were obtained to establish baseline data and calculate ΔR2(*) values - age-adjusted differences between MS patients and HC. Thickness of cortical regions was also measured in all subjects. In cortical regions, ΔR2(*) values of MS patients were also adjusted for changes in cortical thickness. Symbol digit modalities (SDMT) and paced auditory serial addition (PASAT) neurocognitive tests, as well as Expanded Disability Status Score, 25-foot timed walk and nine-hole peg test results were also obtained on all MS subjects. We found that ΔR2(*) values were lower in multiple cortical GM and normal appearing white matter (NAWM) regions in MS compared with HC. ΔR2(*) values of global cortical GM and several specific cortical regions showed significant (p < 0.05) correlations with SDMT and PASAT scores, and showed better correlations than volumetric measures of the same regions. Neurological tests not focused on cognition (Expanded Disability Status Score, 25-foot timed walk and nine-hole peg tests) showed no correlation with cortical GM ΔR2(*) values. The technique presented here is robust and reproducible. It requires less than 10 min and can be implemented on any MRI scanner. Our results show that quantitative tissue-specific R2(*) values can serve as biomarkers of tissue injury due to MS in the brain, including the cerebral cortex, an area that has been difficult to evaluate using standard MRI.

IS Journal 2011 Journal Article

Cyber-Individual Meets Brain Informatics

  • Jianhua Ma
  • Jie Wen
  • Runhe Huang
  • Benxiong Huang

To help people live better in today's digitally explosive environment, the authors envision a Cyber-Individual (Cyber-I) that is the counterpart of a real individual in the physical world.