Author name cluster

Jun Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

80 papers

2 author rows

AAAI Conference 2026 Conference Paper

Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Yujia Zhou
Dingbo Yuan
Xudong Wang
Jun Zhou

The rapid development of large language models (LLMs) has highlighted the need for efficient and reliable methods to evaluate their performance. Traditional evaluation methods often face challenges like high costs, limited task formats, dependence on human references, and systematic biases. To address these limitations, we propose Auto-PRE, an automatic LLM evaluation framework inspired by the peer review process. Unlike previous approaches that rely on human annotations, Auto-PRE automatically selects evaluator LLMs based on three core traits: consistency, pertinence, and self-confidence, which correspond to the instruction, content, and response stages, respectively, and collectively cover the entire evaluation process. Experiments on three representative tasks, including summarization, non-factoid QA, and dialogue generation, demonstrate that Auto-PRE achieves state-of-the-art performance while significantly reducing evaluation costs. Furthermore, the structured and scalable design of our automatic qualification exam framework provides valuable insights into automating the evaluation of LLMs-as-judges, paving the way for more advanced LLM-based evaluation frameworks.

PDF Details DOI

JBHI Journal 2026 Journal Article

Distance Learning-Based Prototypical Network With Multi-Domain Adaptation for Few-Shot Hyperspectral Medical Image Classification

Favour Ekong
Jun Zhou
Jing Wang
Mohammad Aminul Islam
Yongsheng Gao

Hyperspectral imaging (HSI) holds immense potential for medical diagnostics by capturing tissue-specific spectral signatures that facilitate precise disease detection. However, effective HSI classification in clinical settings is hindered by two main challenges: (i) the severe lack of labelled medical HSI samples constrains model training. Prototypical networks, as a few-shot learning paradigm, have been adopted to address label scarcity. However, current Euclidean-based prototypical methods typically assume equal feature variance and spherical distributions, while ignoring intraclass covariance and spectral correlations; (ii) significant domain shifts across heterogeneous medical HSI datasets undermine model generalisation, impair multi-domain interpretability, and force expensive per-dataset retraining. To overcome these limitations, we propose a novel distance-learning-based prototypical network with multi-domain adaptation for few-shot hyperspectral medical image classification. First, by embedding a class-covariance-aware Mahalanobis metric within the prototypical block, our module adapts similarity measures to each class's intrinsic spectral–spatial covariance and scale variations, thereby enhancing prototype robustness under severe label scarcity and significantly reducing misclassification compared with existing few-shot networks. Secondly, we introduce the domain-aware adapter block designed to address domain shift and multi-domain variability by dynamically fusing shared spectral–spatial representations with domain-specific characteristics via spectral integration and switchable adapters. We undertook extensive experiments on three publicly available hyperspectral medical datasets: skin dermoscopy, multidimensional choledochal, and in-vivo brain dataset. Compared to state-of-the-art classifiers, the proposed method achieved excellent performance on all three datasets, paving the way for generalisable HSI solutions in clinical workflows and biomedical research.

Details DOI

EAAI Journal 2026 Journal Article

Global frequency-aware multi-scale feature learning for point cloud normal estimation

Wei Jin
Jun Zhou
Nannan Li
Xiuping Liu

Details DOI

AAAI Conference 2026 Conference Paper

Note2Chat: Improving LLMs for Multi-Turn Clinical History Taking Using Medical Notes

Yang Zhou
Zhenting Sheng
Mingrui Tan
Yuting Song
Jun Zhou
Yu Heng Kwan
Lian Leng Low
Yang Bai

Effective clinical history taking is a foundational yet underexplored component of clinical reasoning. While large language models (LLMs) have shown promise on static benchmarks, they often fall short in dynamic, multi-turn diagnostic settings that require iterative questioning and hypothesis refinement. To address this gap, we propose Note2Chat, a note-driven framework that trains LLMs to conduct structured history taking and diagnosis by learning from widely available medical notes. Instead of relying on scarce and sensitive dialogue data, we convert real-world medical notes into high-quality doctor-patient dialogues using a decision tree-guided generation and refinement pipeline. We then propose a three-stage fine-tuning strategy combining supervised learning, simulated data augmentation, and preference learning. Furthermore, we propose a novel single-turn reasoning paradigm that reframes history taking as a sequence of single-turn reasoning problems. This design enhances interpretability and enables local supervision, dynamic adaptation, and greater sample efficiency. Experimental results show that our method substantially improves clinical reasoning, achieving gains of +16.9 F1 and +21.0 Top-1 diagnostic accuracy over GPT-4o.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Jun Xu
Xinkai Du
Yu Ao
Peilong Zhao
Yang Li
Ling Zhong
Lin Yuan
Zhongpu Bo

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes.

PDF Details DOI

AAAI Conference 2026 Conference Paper

VPHO: Joint Visual-Physical Cue Learning and Aggregation for Hand-Object Pose Estimation

Jun Zhou
Chi Xu
Kaifeng Tang
Yuting Ge
Tingrui Guo
Li Cheng

Estimating the 3D poses of hands and objects from a single RGB image is a fundamental yet challenging problem, with broad applications in augmented reality and human-computer interaction. Existing methods largely rely on visual cues alone, often producing results that violate physical constraints such as interpenetration or non-contact. Recent efforts to incorporate physics reasoning typically depend on post-optimization or non-differentiable physics engines, which compromise visual consistency and end-to-end trainability. To overcome these limitations, we propose a novel framework that jointly integrates visual and physical cues for hand-object pose estimation. This integration is achieved through two key ideas: 1) joint visual-physical cue learning: The model is trained to extract 2D visual cues and 3D physical cues, thereby enabling more comprehensive representation learning for hand-object interactions; 2) candidate pose aggregation: A novel refinement process that aggregates multiple diffusion-generated candidate poses by leveraging both visual and physical predictions, yielding a final estimate that is visually consistent and physically plausible. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art approaches in both pose accuracy and physical plausibility.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Zo3T: Zero-Shot 3D-Aware Trajectory-Guided Image-to-Video Generation via Test-Time Training

Ruicheng Zhang
Jun Zhou
Zunnan Xu
Zihao Liu
Jiehui Huang
Mingyang Zhang
Yu Sun
Xiu Li

Trajectory-Guided image-to-video (I2V) generation aims to synthesize videos that adhere to user-specified motion instructions. Existing methods typically rely on computationally expensive fine-tuning on scarce annotated datasets. Although some zero-shot methods attempt to trajectory control in the latent space, they may yield unrealistic motion by neglecting 3D perspective and creating a misalignment between the manipulated latents and the network's noise predictions. To address these challenges, we introduce Zo3T, a novel zero-shot test-time-training framework for trajectory-guided generation with three core innovations: First, we incorporate a 3D-Aware Kinematic Projection, leveraging inferring scene depth to derive perspective-correct affine transformations for target regions. Second, we introduce Trajectory-Guided Test-Time LoRA, a mechanism that dynamically injects and optimizes ephemeral LoRA adapters into the denoising network alongside the latent state. Driven by a regional feature consistency loss, this co-adaptation effectively enforces motion constraints while allowing the pre-trained model to locally adapt its internal representations to the manipulated latent, thereby ensuring generative fidelity and on-manifold adherence. Finally, we develop Guidance Field Rectification, which refines the denoising evolutionary path by optimizing the conditional guidance field through a one-step lookahead strategy, ensuring efficient generative progression towards the target trajectory. Zo3T significantly enhances 3D realism and motion accuracy in trajectory-controlled I2V generation, demonstrating superior performance over existing training-based and zero-shot approaches.

PDF Details DOI

JMLR Journal 2025 Journal Article

Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback

Boxin Zhao
Lingxiao Wang
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
Chaochao Chen
Mladen Kolar

Due to the high cost of communication, federated learning (FL) systems need to sample a subset of clients that are involved in each round of training. As a result, client sampling plays an important role in FL systems as it affects the convergence rate of optimization algorithms used to train machine learning models. Despite its importance, there is limited work on how to sample clients effectively. In this paper, we cast client sampling as an online learning task with bandit feedback, which we solve with an online stochastic mirror descent (OSMD) algorithm designed to minimize the sampling variance. We then theoretically show how our sampling method can improve the convergence speed of federated optimization algorithms over the widely used uniform sampling. Through both simulated and real data experiments, we empirically illustrate the advantages of the proposed client sampling algorithm over uniform sampling and existing online learning-based sampling strategies. The proposed adaptive sampling procedure is applicable beyond the FL problem studied here and can be used to improve the performance of stochastic optimization procedures such as stochastic gradient descent and stochastic coordinate descent. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )