Author name cluster

Jun Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

269 papers

2 author rows

AAAI Conference 2026 Conference Paper

A General Anchor-Based Framework for Scalable Fair Clustering

Shengfei Wei
Suyuan Liu
Jun Wang
Ke Liang
Miaomiao Li
Lei Luo

Fair clustering is crucial for mitigating bias in unsupervised learning, yet existing algorithms often suffer from quadratic or super-quadratic computational complexity, rendering them impractical for large-scale datasets. To bridge this gap, we introduce the Anchor-based Fair Clustering Framework (AFCF), a novel, general, and plug-and-play framework that empowers arbitrary fair clustering algorithms with linear-time scalability. Our approach first selects a small but representative set of anchors using a novel fair sampling strategy. Then, any off-the-shelf fair clustering algorithm can be applied to this small anchor set. The core of our framework lies in a novel anchor graph construction module, where we formulate an optimization problem to propagate labels while preserving fairness. This is achieved through a carefully designed group-label joint constraint, which we prove theoretically ensures that the fairness of the final clustering on the entire dataset matches that of the anchor clustering. We solve this optimization efficiently using an ADMM-based algorithm. Extensive experiments on multiple large-scale benchmarks demonstrate that AFCF drastically accelerates state-of-the-art methods, which reduces computational time by orders of magnitude while maintaining strong clustering performance and fairness guarantees.

PDF Details DOI

EAAI Journal 2026 Journal Article

A novel knowledge-informed quadratic neurons residual network for explainable fault diagnosis in few-shot scenarios

Panpan Guo
Weiguo Huang
Guifu Du
Yifan Huangfu
Chuancang Ding
Jun Wang

Details DOI

AAAI Conference 2026 Conference Paper

Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?

Naen Xu
Jinghuai Zhang
Changjiang Li
Hengyu An
Chunyi Zhou
Jun Wang
Boyu Xu
Yuyuan Li

Large vision-language models (LVLMs) have achieved remarkable advancements in multimodal reasoning tasks. However, their widespread accessibility raises critical concerns about potential copyright infringement. Will LVLMs accurately recognize and comply with copyright regulations when encountering copyrighted content (i.e., user input, retrieved documents) in the context? Failure to comply with copyright regulations may lead to serious legal and ethical consequences, particularly when LVLMs generate responses based on copyrighted materials (e.g., retrieved book experts, news reports). In this paper, we present a comprehensive evaluation of various LVLMs, examining how they handle copyrighted content – such as book excerpts, news articles, music lyrics, and code documentation when they are presented as visual inputs. To systematically measure copyright compliance, we introduce a large-scale benchmark dataset comprising 50,000 multimodal query-content pairs designed to evaluate how effectively LVLMs handle queries that could lead to copyright infringement. Given that real-world copyrighted content may or may not include a copyright notice, the dataset includes query-content pairs in two distinct scenarios: with and without a copyright notice. For the former, we extensively cover four types of copyright notices to account for different cases. Our evaluation reveals that even state-of-the-art closed-source LVLMs exhibit significant deficiencies in recognizing and respecting the copyrighted content, even when presented with the copyright notice. To solve this limitation, we introduce a novel tool-augmented defense framework for copyright compliance, which reduces infringement risks in all scenarios. Our findings underscore the importance of developing copyright-aware LVLMs to ensure the responsible and lawful use of copyrighted content.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Cancer Survival Prediction by Cyclic Generation and Multi-grained Alignment

Yongqi Bu
Qinggang Niu
Zhen Li
Yanyu Xu
Jun Wang
Guoxian Yu

Cancer survival analysis with multimodal data is crucial for precise treatments and patient benefits. However, the following challenges prohibit integrating histopathology and genomics: (i) multimodal data is not always complete, especially for the more costly genomics data; (ii) intricate interactions between different modalities are difficult to capture and understand. To response, we propose an end-to-end framework (CIMA) that coordinates Cyclic modality generation and Multi-grained multimodal Alignment. Specifically, CIMA designs a cyclic modality reconstruction module to reciprocally impute missing modalities and infer the interactions between them. Next, it introduces the multi-grained alignment module over the imputed data and interactions to mine fine-grained alignments between histopathology (slide patches) and genomics (biological pathways). CIMA then constructs the adaptive fusion module to leverage multimodal data and alignments for survival prediction. Extensive experiments on cancer benchmark datasets demonstrate that CIMA outperforms existing methods and exhibits good interpretability, providing valuable insights into intricate relationships between pathological phenotypes and biological pathways.Our code is released in the supplementary materials.

PDF Details DOI

EAAI Journal 2026 Journal Article

Compact axial attention with detail enhancement for visual object tracking

Yuanyun Wang
Geng Gu
Chao Tang
Jun Wang

Details DOI

AAAI Conference 2026 Conference Paper

Compression Artifacts Removal for VVC with Frequency Domain Mixture of Experts Network

Qijun Wang
Kang Wang
Jun Wang

In recent years, lossy compression algorithms such as H.264/AVC, H.265/HEVC, and H.266/VVC have been proposed and widely applied in image and video encoding. However, these compression algorithms inevitably introduce various complex types of compression artifacts, which severely degrade image quality. Although existing methods have attempted to remove artifacts through filter design or probabilistic prior modeling, they are often effective only for specific types of artifacts, lacking generalization and adaptability. To address this, we propose a novel image compression artifacts removal model: ARMoE, which combines multiple frequency domain transformations with the Mixture of Experts (MoE). Considering the frequency distribution and energy distribution differences of images, we introduce various frequency domain transformations as expert branches and use the Sparse Activation Strategy to adaptively select the optimal frequency domain expert to suppress compression artifacts, achieving an efficient artifacts removal method. Furthermore, we reencode and decode multiple original uncompressed high-quality datasets, including DF2K and Kodak24, using the VTM-20.0 codec under the H.266/VVC standard, constructing a more challenging artifacts dataset. We conducted rigorous comparative experiments with current state-of-the-art image restoration methods and the results demonstrate that ARMoE exhibits outstanding image restoration capability.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Counterfactual Fairness with Imperfect Causal Graphs

Cong Su
Qiaoyu Tan
Carlotta Domeniconi
Lizhen Cui
Jun Wang
Guoxian Yu

Fairness-aware machine learning aims to build predictive models that comply with fairness requirements, particularly concerning sensitive attributes such as race, gender, and age. Among causality-based fairness notions, counterfactual fairness is widely adopted for its individual-level guarantees, requiring that an individual’s predicted outcome remains unchanged in a counterfactual world where its sensitive attribute is altered. However, existing methods critically assume that the true causal graph is fully known, which is rarely the case in practice. Moreover, counterfactual fairness suffers from inherent identifiability limitations, as counterfactual quantities cannot always be uniquely estimated from observational data, especially under incomplete causal knowledge. To address these challenges, we propose a principled framework (CF-ICG) for counterfactual fairness under imperfectly known causal graphs, e.g., Completed Partially Directed Acyclic Graphs (CPDAGs). We first introduce a criterion to determine the identifiability, and bound the counterfactual quantities under CPDAGs. Building upon this, we develop an efficient local algorithm that avoids the exhaustive enumeration of all DAGs, ensuring robustness against worst-case fairness violations. Experimental results on synthetic and real-world datasets demonstrate the practical effectiveness and theoretical soundness of CF-ICG.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DMCAR: Disentangled Mixture-of-Experts with Context-Aware Routing for Multi-View Clustering

Baili Xiao
Ke Liang
Jiaqi Jin
Jun Wang
Yinbo Xu
Siwei Wang
En Zhu

Multi-View Clustering (MVC) aims to enhance clustering performance by integrating multi-source complementary information. However, existing deep MVC methods face inherent challenges in balancing the learning of shared consensus representations with the preservation of view-specific information: independent encoders hinder effective cross-view collaboration, while a single shared encoder tends to sacrifice representation diversity. Although the recently introduced Mixture-of-Experts (MoE) model offers a novel approach to facilitating view collaboration, its flattened expert pool design often leads to entanglement between shared and specific information, and its routing mechanism limits collaboration potential by neglecting cross-view context. To address these challenges, this paper proposes a novel deep multi-view clustering framework—Decoupled Mixture-of-Experts with Context-Aware Routing for Multi-View Clustering (DMCAR-MVC). At its core is an innovative Decoupled MoE (D-MoE) architecture. We establish a public expert pool to learn cross-view shared representations while equipping each view with an independent private expert pool to capture its unique information, thereby structurally enforcing the decoupling of shared and specific representations. Building on this, we further design a Context-Aware Hierarchical Routing (CAHR) mechanism. When routing for the public expert pool, this mechanism introduces a global context vector to guide expert selection, enabling more efficient and globally informed cross-view collaboration. Finally, to optimize the model, we adopt a multi-level contrastive learning paradigm: on one hand, a cross-view alignment loss ensures semantic consistency in shared representations; on the other, an orthogonality constraint is imposed to further enhance separability between shared and specific representations. Extensive experiments on multiple benchmark datasets demonstrate that DMCAR-MVC significantly outperforms state-of-the-art methods across key clustering metrics. Additionally, comprehensive ablation studies thoroughly validate the effectiveness and necessity of each proposed component.

PDF Details DOI

AAAI Conference 2026 Conference Paper

GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation

Xuan Zhao
Zhongyu Zhang
Yuge Huang
Yuxi Mi
Guodong Mu
Shouhong Ding
Jun Wang
Rizen Guo

Existing state-of-the-art image tokenization methods leverage diverse semantic features from pre-trained vision models for additional supervision, to expand the distribution of latent representations and thereby improve the quality of image reconstruction and generation. These methods employ a locally supervised approach for semantic supervision, which limits the uniformity of semantic distribution. However, VA-VAE proves that a more uniform feature distribution yields better generation performance. In this work, we introduce a Global Perspective Tokenizer (GloTok), which utilizes global relational information to model a more uniform semantic distribution of tokenized features. Specifically, a codebook-wise histogram relation learning method is proposed to transfer the semantics, which are modeled by pre-trained models on the entire dataset, to the semantic codebook. Then, we design a residual learning module which recovers the fine-grained details to minimize the reconstruction error caused by quantization. Through the above design, GloTok delivers more uniformly distributed semantic latent representations, which facilitates the training of autoregressive (AR) models for generating high-quality images without requiring direct access to pre-trained models during the training process. Experiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.

PDF Details DOI

JBHI Journal 2026 Journal Article

Improving Medical Visual Representation Learning With Pathological-Level Cross-Modal Alignment and Correlation Exploration

Jun Wang
Lixing Zhu
Xiaohan Yu
Abhir Bhalerao
Yulan He

Learning medical visual representations from image-report pairs through joint learning has garnered increasing research attention due to its potential for transferring acquired knowledge to various downstream medical tasks. Previous works have predominantly focused on instance-wise or token-wise cross-modal alignment, often neglecting the importance of pathological-level consistency. This paper presents a novel framework PLACE that promotes the P athological- L evel A lignment and enriches the fine-grained details via C orrelation E xploration without additional human annotations. Specifically, we propose a novel pathological-level cross-modal alignment (PCMA) approach to maximize the consistency of pathology observations from both images and reports. To facilitate this, a Visual Pathology Observation Extractor is introduced to extract visual pathological observation representations from localized tokens. The PCMA module operates independently of any external disease annotations, enhancing the generalizability and robustness of our methods. Furthermore, we design a proxy task that enforces the model to identify correlations among image patches, thereby enriching the fine-grained details crucial for various downstream tasks. Experimental results demonstrate that our proposed framework achieves new state-of-the-art performance on multiple downstream tasks, including classification, image-to-text retrieval, semantic segmentation, object detection and report generation.

Details DOI

AAAI Conference 2026 Conference Paper

LSAP-PV: High-Fidelity Palm Vein Image Synthesis via Layered Spectral Absorption Projection-Guided Diffusion Model

Sheng Shang
Chenglong Zhao
Ruixin Zhang
Jianlong Jin
Jingyun Zhang
Jun Wang
Yang Zhao
Shouhong Ding

Palm vein recognition has emerged as a promising biometric technology, yet its development remains constrained by the scarcity of large-scale publicly available datasets. Several methods of palm vein image generation have been proposed to address this issue. These methods usually focus on the anatomical realism of palm vein patterns, but overlook the biophysical correlation between identities and vein patterns, particularly in simulating identity-specific vein contrast. To tackle this limitation, we propose a novel biophysics-driven synthesis method. Our method constructs a 3D palm vascular tree via established modeling method. Then, a projection model is proposed to map the 3D tree into 2D space to derive palm vein patterns. The projection model is based on skin spectral absorption and simulates the natural attenuation of light passing through the skin using a layer integration method. For different identities, we sample different skin parameters, resulting in varying degrees of attenuation. This method effectively simulates the variation in vein contrast across different identities. Furthermore, we introduce a conditional diffusion model that uses the projected patterns as identity conditions to generate palm vein images. To the best of our knowledge, this is the first palm vein generation method based on the diffusion model. Experimental results demonstrate that our method not only outperforms existing methods, but also enables a recognition model trained on our synthetic data to achieve superior performance compared to a model trained on real-world data at a scale of 2,000 IDs under an open-set protocol with a TAR@FAR=1:1 of 1e-4.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MLLM Enriched Explainable Multiple Clustering

Shan Zhang
Liangrui Ren
Qiaoyu Tan
Carlotta Domeniconi
Wei Du
Jun Wang
Guoxian Yu

Multiple clustering aims to uncover diverse latent structures within the data, enabling a more comprehensive understanding of complex datasets. However, existing approaches either heavily rely on user-supplied keywords or disregard user-interested clustering types, limiting the ability to discover the full range of explainable clusterings of interests, particularly in high-dimensional settings. Furthermore, existing methods insufficiently leverage the rich textual semantics and fall short in fully integrating multi-modal information. To address these challenges, we propose MLLM enriched Multiple Clustering (MLLMMC), a novel framework that leverages multi-modal large language model (MLLM) to explore explainable non-redundant clustering. Specifically, MLLMMC first employs MLLM to generate sample descriptions, which serve as input for LLM to perform prompt-driven reasoning and infer latent clustering types, and then merges them with user-interested types to obtain diverse and explainable clustering types. For each selected type, MLLMMC utilizes MLLM to generate sample-level textual descriptions and aligns them with corresponding visual features through a cross-attention fusion module, which produces a semantically aligned and enriched representation for the target clustering type. Extensive experiments on six benchmark datasets from diverse domains demonstrate that MLLMMC achieves diverse, explainable, and high-quality clustering outcomes, outperforming state-of-the-art multiple clustering methods with a large margin.

PDF Details DOI

YNICL Journal 2026 Journal Article

Neurovascular coupling, cognition, and cardiac function in stroke-free atrial fibrillation

Songhong Yue
Jintao Wang
Jiahao Yan
Wanjun Hu
Jun Wang
Yucheng Ding
Laiyang Ma
Pengfei Wang

Details DOI

JBHI Journal 2026 Journal Article

Privacy Preserved Blood Glucose Level Cross-Prediction: An Asynchronous Decentralized Federated Learning Approach

Chengzhe Piao
Taiyu Zhu
Yu Wang
Stephanie E Baldeweg
Paul Taylor
Pantelis Georgiou
Jiahao Sun
Jun Wang

Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant “cold start” problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models in a privacy-conscious manner is challenging, especially given that such data is often stored on personal devices. Considering the privacy protection and addressing the “cold start” problem in diabetes care, we propose “GluADFL”, blood Glucose prediction by Asynchronous Decentralized Federated Learning. We compared GluADFL with eight baseline methods using four distinct T1D datasets, comprising 298 participants, which demonstrated its superior performance in accurately predicting BG levels for cross-patient analysis. Furthermore, patients’ data might be stored and shared across various communication networks in GluADFL, ranging from highly interconnected (e. g. , random, performs the best among others) to more structured topologies (e. g. , cluster and ring), suitable for various social networks. The asynchronous training framework supports flexible participation. By adjusting the ratios of inactive participants, we found it remains stable if less than 70% are inactive. Our results confirm that GluADFL offers a practical, privacy-preserved solution for BG prediction in T1D, significantly enhancing the quality of diabetes management.

Details DOI

AAAI Conference 2026 Conference Paper

Proactive Constrained Policy Optimization with Preemptive Penalty

Ning Yang
Pengyu Wang
Guoqing Liu
Haifeng Zhang
Pin Lyu
Jun Wang

Safe Reinforcement Learning (RL) often faces significant issues such as constraint violations and instability, necessitating the use of constrained policy optimization, which seeks optimal policies while ensuring adherence to specific constraints like safety. Typically, constrained optimization problems are addressed by the Lagrangian method, a post-violation remedial approach that may result in oscillations and overshoots. Motivated by this, we propose a novel method named Proactive Constrained Policy Optimization (PCPO) that incorporates a preemptive penalty mechanism. This mechanism integrates barrier items into the objective function as the policy nears the boundary, imposing a cost. Meanwhile, we introduce a constraint-aware intrinsic reward to guide boundary-aware exploration, which is activated only when the policy approaches the constraint boundary. We establish theoretical upper and lower bounds for the duality gap and the performance of the PCPO update, shedding light on the method's convergence characteristics. Additionally, to enhance the optimization performance, we adopt a policy iteration approach. An interesting finding is that PCPO demonstrates significant stability in experiments. Experimental results indicate that the PCPO framework provides a robust solution for policy optimization under constraints, with important implications for future research and practical applications.

PDF Details DOI

EAAI Journal 2026 Journal Article

Realistic infrared image generation based on physics-guided latent diffusion

Huanyu Yang
Mengchu Tian
Jun Wang
Yuming Bo
Jiacun Wang
Henry Han
Peng Zhu
Giancarlo Fortino

Details DOI

TMLR Journal 2026 Journal Article

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Guibin Zhang
Hejia Geng
Xiaohang Yu
Zhenfei Yin
Zaibin Zhang
Zelin Tan
Heng Zhou
Zhong-Zhi Li

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM RL with the temporally extended Partially Observable Markov Decision Processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.