Arrow Research search

Author name cluster

Qi Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
1 author row

Possible papers

10

EAAI Journal 2025 Journal Article

A hierarchical deep reinforcement learning method for coupled transportation and power distribution system dispatching

  • Qi Han
  • Xueping Li
  • Liangce He

The randomness and dimensionality growth of variables in the Coupled transportation and power distribution systems (CTPS) pose challenges for effectively solving CTPS dispatching tasks. This paper presents a hierarchical deep reinforcement learning (HDRL) method, which disperses the action and state space of CTPS onto decision-making layer and autonomous optimization layer. The Cloud DRL model in the decision-making layer is responsible for the load assignment task of charging stations. The distribution network (DN) and transportation network (TN) DRL models in the autonomous optimization layer are responsible for optimizing the DN and TN respectively. A layer-wise training method is adopted to alleviate the asynchronous convergence problem of HDRL. Firstly, the Gurobi assists in achieving the efficient training of Cloud DRL model by ensuring the reward effectiveness of autonomous optimization layers. Meanwhile, the differential evolution (DE) algorithm assists in optimizing the diversity and focalization of the Transitions by controlling distribution patterns of species initialization, during the pre-sampling and training stage. Then, the trained Cloud DRL model is frozen to train the DN and TN DRL models. This method is tested on two different sizes of CTPS. Simulation analysis shows that this method improves the training performance of the HDRL model.

JBHI Journal 2025 Journal Article

CAISeg: A Clustering-Aided Interactive Network for Lesion Segmentation in 3D Medical Imaging

  • Yukang Sun
  • Shujun Zhang
  • Jinsong Li
  • Qi Han
  • Yuhua Qin

Accurate lesion segmentation in medical imaging is critical for medical diagnosis and treatment. Lesions' diverse and heterogeneous characteristics often present a distinct long-tail distribution, posing difficulties for automatic methods. Currently, interactive segmentation approaches have shown promise in improving accuracy, but still struggle to deal with tail features. This triggers a demand of effective utilizing strategies of user interaction. To this end, we propose a novel point-based interactive segmentation model called Clustering-Aided Interactive Segmentation Network (CAISeg) in 3D medical imaging. A customized Interaction-Guided Module (IGM) adopts the concept of clustering to capture features that are semantically similar to interaction points. These clustered features are then mapped to the head regions of the prompted category to facilitate more precise classification. Meanwhile, we put forward a Focus Guided Loss function to grant the network an inductive bias towards user interaction through assigning higher weights to voxels closer to the prompted points, thereby improving the responsiveness efficiency to user guidance. Evaluation across brain tumor, colon cancer, lung cancer, and pancreas cancer segmentation tasks show CAISeg's superiority over the state-of-the-art methods. It outperforms the fully automated segmentation models in accuracy, and achieves results comparable to or better than those of the leading point-based interactive methods while requiring fewer prompt points. Furthermore, we discover that CAISeg possesses good interpretability at various stages, which endows CAISeg with potential clinical application value.

AAMAS Conference 2025 Conference Paper

Hitchhiker's Guide to Patrolling: Path-Finding for Energy-Sharing Drone-UGV Teams

  • Jonathan Diller
  • Qi Han
  • Robert Byers
  • James Dotterweich
  • James Humann

Teams of Unmanned Ground Vehicles (UGVs) and drones are often proposed for various patrolling applications, where drones quickly move from one point of interest to the next while UGVs act as moving base stations that can both recharge and ferry around the drones. In this paper, we look at how to plan collaborative actions between drones and UGVs for a patrolling mission over an indefinite time horizon. We demonstrate how to form a second-order cone (SOC) program that finds optimal solutions, in polynomial time, to a variant of the larger problem where the order of drone and UGV actions are fixed. We propose two algorithms that use our SOC program to find locally optimal solutions while considering the limited energy of both UGVs and drones. Our numerical simulation results show that both of our algorithms yield a greater than 50% improvement in solution quality when compared to a baseline method from the literature. Additionally, we demonstrate the authenticity of our problem setup through a proof-of-concept experiment on a physical UGV and drone testbed.

NeurIPS Conference 2025 Conference Paper

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

  • Yana Wei
  • Liang Zhao
  • Jianjian Sun
  • Kangheng Lin
  • jisheng yin
  • Jingcheng Hu
  • Yinmin Zhang
  • En Yu

The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) to unlock advanced visual reasoning. We introduce a two-stage paradigm built on Qwen2. 5-VL-7B: a massive linguistic cold-start fine-tuning, followed by multimodal reinforcement learning (RL) spanning nearly 1, 000 steps—surpassing all previous open-source efforts in scale. This pioneering work reveals three fundamental insights: 1) Behavior transfer emerges surprisingly early in cold start due to linguistic mental imagery. 2) Cold start broadly memorizes visual behaviors, while RL critically discerns and scales up effective patterns. 3) Transfer strategically favors high-utility behaviors such as visual reflection. Our resulting model, Open-Vision-Reasoner (OVR), achieves state-of-the-art performance on a suite of reasoning benchmarks, including 95. 3% on MATH500, 51. 8% on MathVision and 54. 6% on MathVerse. We release our model, data, and training dynamics to catalyze the development of more capable, behavior-aligned multimodal reasoners.

NeurIPS Conference 2025 Conference Paper

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

  • Jingcheng Hu
  • Yinmin Zhang
  • Qi Han
  • Daxin Jiang
  • Xiangyu Zhang
  • Heung-Yeung Shum

We introduce Open-Reasoner-Zero, the first open source implementation of large-scale reasoning-oriented RL training on the base model focusing on scalability, simplicity and accessibility. Through extensive experiments, we demonstrate that a minimalist approach, vanilla PPO with GAE ($\lambda=1$, $\gamma=1$) and straightforward rule-based rewards, without any KL regularization, is sufficient to scale up both benchmark performance and response length, replicating the scaling phenomenon observed in DeepSeek-R1-Zero. Using the same base model as DeepSeek-R1-Zero-Qwen-32B, our implementation achieves superior performance across AIME2024, MATH500, and GPQA Diamond, while demonstrating remarkable efficiency—requiring only 1/10 of the training steps compared to the DeepSeek-R1-Zero pipeline. We validate that this recipe generalizes well across diverse training domains and different model families without algorithmic modifications. Moreover, our analysis not only covers training dynamics and ablation for critical design choices, but also quantitatively show how the learned critic in Reasoner-Zero training effectively identifies and devalues repetitive response patterns, yielding more robust advantage estimations and enhancing training stability. Embracing the principles of open-source, we release our source code, parameter settings, training data, and model weights across various sizes, fostering reproducibility and encouraging further exploration of the properties of related models.

AAAI Conference 2024 Conference Paper

Forced Exploration in Bandit Problems

  • Qi Han
  • Li Zhu
  • Fei Guo

The multi-armed bandit(MAB) is a classical sequential decision problem. Most work requires assumptions about the reward distribution (e.g., bounded), while practitioners may have difficulty obtaining information about these distributions to design models for their problems, especially in non-stationary MAB problems. This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution while still achieving substantial regret upper bounds. To this end, we propose a novel algorithm alternating between greedy rule and forced exploration. Our method can be applied to Gaussian, Bernoulli and other subgaussian distributions, and its implementation does not require additional information. We employ a unified analysis method for different forced exploration strategies and provide problem-dependent regret upper bounds for stationary and piecewise-stationary settings. Furthermore, we compare our algorithm with popular bandit algorithms on different reward distributions.

IJCAI Conference 2024 Conference Paper

InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification

  • Qi Han
  • Zhibo Tian
  • Chengwei Xia
  • Kun Zhan

Semi-supervised image classification, leveraging pseudo supervision and consistency regularization, has demonstrated remarkable success. However, the ongoing challenge lies in fully exploiting the potential of unlabeled data. To address this, we employ information entropy neural estimation to utilize the potential of unlabeled samples. Inspired by contrastive learning, the entropy is estimated by maximizing a lower bound on mutual information across different augmented views. Moreover, we theoretically analyze that the information entropy of the posterior of an image classifier is approximated by maximizing the likelihood function of the softmax predictions. Guided by these insights, we optimize our model from both perspectives to ensure that the predicted probability distribution closely aligns with the ground-truth distribution. Given the theoretical connection to information entropy, we name our method InfoMatch. Through extensive experiments, we show its superior performance. The source code is available at https: //github. com/kunzhan/InfoMatch.

AAMAS Conference 2023 Conference Paper

Energy-aware UAV Path Planning with Adaptive Speed

  • Jonathan Diller
  • Qi Han

Unmanned Aerial Vehicles (UAVs) are a versatile platform that can be used for many data collection applications including emergency response, environmental monitoring, surveillance and many others. In this work, we investigate how to plan efficient paths that minimize mission completion time for UAV data collection where the UAV must rendezvous with a moving ground vehicle that cannot stop and wait for the UAV. We also address the limited onboard energy storage issue by adapting UAV speed. We propose a mixedinteger nonlinear program solution to solve the underlying path planning problem to optimality and provide a more tractable alternative approach. We evaluate our two approaches in extensive simulations using real UAV characteristics and prototype our solution on a physical drone testbed. We show that our two approaches can reduce completion time by up to 23. 8% and 14. 5%, respectively, when compared against other baseline approaches and demonstrate the importance of UAV speed adaptation in route planning for UAVs.

NeurIPS Conference 2023 Conference Paper

RevColV2: Exploring Disentangled Representations in Masked Image Modeling

  • Qi Han
  • Yuxuan Cai
  • Xiangyu Zhang

Masked image modeling (MIM) has become a prevalent pre-training setup for vision foundation models and attains promising performance. Despite its success, existing MIM methods discard the decoder network during downstream applica- tions, resulting in inconsistent representations between pre-training and fine-tuning and can hamper downstream task performance. In this paper, we propose a new architecture, RevColV2, which tackles this issue by keeping the entire autoen- coder architecture during both pre-training and fine-tuning. The main body of RevColV2 contains bottom-up columns and top-down columns, between which information is reversibly propagated and gradually disentangled. Such design enables our architecture with the nice property: maintaining disentangled low-level and semantic information at the end of the network in MIM pre-training. Our experimental results suggest that a foundation model with decoupled features can achieve competitive performance across multiple downstream vision tasks such as image classification, semantic segmentation and object detection. For exam- ple, after intermediate fine-tuning on ImageNet-22K dataset, RevColV2-L attains 88. 4\% top-1 accuracy on ImageNet-1K classification and 58. 6 mIoU on ADE20K semantic segmentation. With extra teacher and large scale dataset, RevColv2-L achieves 62. 1 APbox on COCO detection and 60. 4 mIoU on ADE20K semantic segmentation.

YNICL Journal 2019 Journal Article

Asymmetry in cortical thickness and subcortical volume in treatment-naïve major depressive disorder

  • Zhiwei Zuo
  • Shuhua Ran
  • Yao Wang
  • Chang Li
  • Qi Han
  • Qianying Tang
  • Wei Qu
  • Haitao Li

BACKGROUND: Numerous cognitive and emotional functions are executed asymmetrically between the left and right hemispheres. Right hemisphere hyperactivity/left hemisphere hypoactivity often appears to be a feature in neuroimaging studies of depression. However, few studies have evaluated abnormalities in structural asymmetry in untreated patients with major depressive disorder (MDD). METHODS: In this study, 3-dimensional high-resolution structural magnetic resonance images were acquired from 35 treatment-naïve patients with MDD (mean age = 28.9 years, 22 females) and 35 normal controls. The asymmetry index in cortical thickness and subcortical volume were calculated based on an automated surface-based technique. RESULTS: Abnormalities in structural asymmetry in patients with MDD were mainly located in the cortical-striatal-pallidal-thalamic circuit, including the superior frontal cortex, rostral middle frontal cortex, caudal middle frontal cortex, nucleus accumbens, pallidum and thalamus. No significant correlation was observed between symptom severity and asymmetric measurements. CONCLUSION: These findings provide further evidence for the altered morphological interhemispheric imbalances in depression and these alterations were independent of depressive symptom severity, suggesting that cerebral asymmetry could be an appropriate indicator of morphological variations in mental disease.