Arrow Research search

Author name cluster

Zihan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

AAAI Conference 2026 Conference Paper

Self-Enhanced Image Clustering with Cross-Modal Semantic Consistency

  • Zihan Li
  • Wei Sun
  • Jing Hu
  • Jianhua Yin
  • Xing Wang
  • Erwei Yin
  • Jianlong Wu

While large language-image pre-trained models like CLIP offer powerful generic features for image clustering, existing methods typically freeze the encoder. This creates a fundamental mismatch between the model's task-agnostic representations and the demands of a specific clustering task, imposing a ceiling on performance. To break this ceiling, we propose a self-enhanced framework based on cross-modal semantic consistency for efficient image clustering. Our framework first builds a strong foundation via Cross-Modal Semantic Consistency and then specializes the encoder through Self-Enhancement. In the first stage, we focus on Cross-Modal Semantic Consistency. By mining consistency between generated image-text pairs at the instance, cluster assignment, and cluster center levels, we train lightweight clustering heads to align with the rich semantics of the pre-trained model. This alignment process is bolstered by a novel method for generating higher-quality cluster centers and a dynamic balancing regularizer to ensure well-distributed assignments. In the second stage, we introduce a Self-Enhanced fine-tuning strategy. The well-aligned model from the first stage acts as a reliable pseudo-label generator. These self-generated supervisory signals are then used to feed back the efficient, joint optimization of the vision encoder and clustering heads, unlocking their full potential. Extensive experiments on six mainstream datasets show that our method outperforms existing deep clustering methods by significant margins. Notably, our ViT-B/32 model already matches or even surpasses the accuracy of state-of-the-art methods built upon the far larger ViT-L/14.

IROS Conference 2025 Conference Paper

Complex Robotic Manipulation via Hindsight Goal Diffusion and Graph-based Experience Replay

  • Zihao Sun
  • Zihan Li
  • Jinrui He
  • Yong Song 0005
  • Pingping Liu
  • Qingyang Xu
  • Xianfeng Yuan
  • Rui Song 0002

Goal-conditioned reinforcement learning (GCRL) is an effective method for multi-goal robotic manipulation tasks. Many studies based on hindsight experience replay (HER) and hindsight goal generation (HGG) have achieved the autonomous acquisition of robotic manipulation in reward-sparse environments and have greatly improved the learning efficiency of GCRL. However, these methods perform poorly in environments with obstacles and distant goals. In this paper, we propose hindsight goal diffusion and graph-based experience replay (HGD-GER) for complex robotic manipulation. First, obstacle-avoiding graphs in environments with obstacles are constructed, and the graph-based distance metric between different goals is established. Second, the proposed HGD approach utilizes the inherent denoising mechanism of diffusion models and obstacle-avoiding graph-based distance to generate exploration goals, thereby promoting the exploration of obstacle-bypassing areas. Then, GER module modifies the reward value of experience replay by graph-based distance, thereby avoiding the bias introduced by HER and improving the learning performance of the RL algorithm under sparse reward conditions. Finally, we conducted experiments on three robotic manipulation tasks with obstacles and distant goals, and the results show that the proposed HGD-GER achieves excellent learning performance. Additionally, the proposed method is deployed on the physical robot.

ICML Conference 2025 Conference Paper

FIC-TSC: Learning Time Series Classification with Fisher Information Constraint

  • Xiwen Chen
  • Wenhui Zhu
  • Peijie Qiu
  • Hao Wang 0176
  • Huayu Li
  • Zihan Li
  • Yalin Wang 0001
  • Aristeidis Sotiras

Analyzing time series data is crucial to a wide spectrum of applications, including economics, online marketplaces, and human healthcare. In particular, time series classification plays an indispensable role in segmenting different phases in stock markets, predicting customer behavior, and classifying worker actions and engagement levels. These aspects contribute significantly to the advancement of automated decision-making and system optimization in real-world applications. However, there is a large consensus that time series data often suffers from domain shifts between training and test sets, which dramatically degrades the classification performance. Despite the success of (reversible) instance normalization in handling the domain shifts for time series regression tasks, its performance in classification is unsatisfactory. In this paper, we propose $\textit{FIC-TSC}$, a training framework for time series classification that leverages Fisher information as the constraint. We theoretically and empirically show this is an efficient and effective solution to guide the model converges toward flatter minima, which enhances its generalizability to distribution shifts. We rigorously evaluate our method on 30 UEA multivariate and 85 UCR univariate datasets. Our empirical results demonstrate the superiority of the proposed method over 14 recent state-of-the-art methods.

TIST Journal 2025 Journal Article

Horizon Forcing: Improving the Recurrent Forecasting of Chaotic Systems

  • Yong Zhuang
  • Matthew Almeida
  • Wei Ding
  • Shafiqul Islam
  • Zihan Li
  • Ping Chen

Chaotic dynamics are ubiquitous in many real-world systems, ranging from biological and industrial processes to climate dynamics and the spread of viruses. These systems are characterized by high sensitivity to initial conditions, making it challenging to predict their future behavior confidently. In this study, we propose a novel deep-learning framework that addresses this challenge by directly exploiting the long-term compounding of local prediction errors during model training, aiming to extend the time horizon for reliable predictions of chaotic systems. Our approach observes the future trajectories of initial errors at a time horizon, modeling the evolution of the loss to that point through the use of two major components: (1) a recurrent architecture (Error Trajectory Tracing) designed to trace the trajectories of predictive errors through phase space, and (2) a training regime, Horizon Forcing, that pushes the model’s focus out to a predetermined time horizon. We validate our method on three classic chaotic systems and six real-world time series prediction tasks with chaotic characteristics. The results show that our approach outperforms the state-of-the-art methods.

AAAI Conference 2025 Conference Paper

LLM-RG4: Flexible and Factual Radiology Report Generation Across Diverse Input Contexts

  • Zhuhao Wang
  • Yihua Sun
  • Zihan Li
  • Xuan Yang
  • Fang Chen
  • Hongen Liao

Drafting radiology reports is a complex task requiring flexibility, where radiologists tail content to available information and particular clinical demands. However, most current radiology report generation (RRG) models are constrained to a fixed task paradigm, such as predicting the full ''finding'' section from a single image, inherently involving a mismatch between inputs and outputs. The trained models lack the flexibility for diverse inputs and could generate harmful, input-agnostic hallucinations. To bridge the gap between current RRG models and the clinical demands in practice, we first develop a data generation pipeline to create a new MIMIC-RG4 dataset, which considers four common radiology report drafting scenarios and has perfectly corresponded input and output. Secondly, we propose a novel large language model (LLM) based RRG framework, namely LLM-RG4, which utilizes LLM's flexible instruction-following capabilities and extensive general knowledge. We further develop an adaptive token fusion module that offers flexibility to handle diverse scenarios with different input combinations, while minimizing the additional computational burden associated with increased input volumes. Besides, we propose a token-level loss weighting strategy to direct the model's attention towards positive and uncertain descriptions. Experimental results demonstrate that LLM-RG4 achieves state-of-the-art performance in both clinical efficiency and natural language generation on the MIMIC-RG4 and MIMIC-CXR datasets. We quantitatively demonstrate that our model has minimal input-agnostic hallucinations, whereas current open-source models commonly suffer from this problem.

ICML Conference 2025 Conference Paper

MDDM: Practical Message-Driven Generative Image Steganography Based on Diffusion Models

  • Zihao Xu
  • Dawei Xu
  • Zihan Li
  • Chuan Zhang 0003

Generative image steganography (GIS) is an emerging technique that conceals secret messages in the generation of images. Compared to GAN-based or flow-based GIS schemes, diffusion model-based solutions can provide high-quality and more diverse images, thus receiving considerable attention recently. However, previous GIS schemes still face challenges in terms of extraction accuracy, controllability, and practicality. To address the above issues, this paper proposes a practical message-driven GIS framework based on diffusion models, called MDDM. Specifically, by utilizing the Cardan grille, we encode messages into Gaussian noise, which serves as the initial input for image generation, enabling users to generate diverse images via controllable prompts without additional training. During the information extraction process, receivers only need to use the pre-shared Cardan grille to perform diffusion inversion and recover the messages without requiring the image generation seeds or prompts. Experimental results demonstrate that MDDM offers notable advantages in terms of accuracy, controllability, practicality, and security. With flexible strategies, MDDM can achieve accuracy close to 100% under appropriate settings. Additionally, MDDM demonstrates certain robustness and potential for application in watermarking tasks.

NeurIPS Conference 2025 Conference Paper

Spik-NeRF: Spiking Neural Networks for Neural Radiance Fields

  • Gang Wan
  • Qinlong Lan
  • Zihan Li
  • Huimin Wang
  • Wu Yitian
  • wang zhen
  • Wanhua Li
  • Yufei Guo

Spiking Neural Networks (SNNs), as a biologically inspired neural network architecture, have garnered significant attention due to their exceptional energy efficiency and increasing potential for various applications. In this work, we extend the use of SNNs to neural rendering tasks and introduce Spik-NeRF (Spiking Neural Radiance Fields). We observe that the binary spike activation map of traditional SNNs lacks sufficient information capacity, leading to information loss and a subsequent decline in the performance of spiking neural rendering models. To address this limitation, we propose the use of ternary spike neurons, which enhance the information-carrying capacity in the spiking neural rendering model. With ternary spike neurons, Spik-NeRF achieves performance that is on par with, or nearly identical to, traditional ANN-based rendering models. Additionally, we present a re-parameterization technique for inference that allows Spik-NeRF with ternary spike neurons to retain the event-driven, multiplication-free advantages typical of binary spike neurons. Furthermore, to further boost the performance of Spik-NeRF, we employ a distillation method, using an ANN-based NeRF to guide the training of our Spik-NeRF model, which is more compatible with the ternary neurons compared to the standard binary neurons. We evaluate Spik-NeRF on both realistic and synthetic scenes, and the experimental results demonstrate that Spik-NeRF achieves rendering performance comparable to ANN-based NeRF models.

TMLR Journal 2024 Journal Article

Regret Bounds for Noise-Free Cascaded Kernelized Bandits

  • Zihan Li
  • Jonathan Scarlett

We consider optimizing a function network in the noise-free grey-box setting with RKHS function classes, where the exact intermediate results are observable. We assume that the structure of the network is known (but not the underlying functions comprising it), and we study three types of structures: (1) chain: a cascade of scalar-valued functions, (2) multi-output chain: a cascade of vector-valued functions, and (3) feed-forward network: a fully connected feed-forward network of scalar-valued functions. We propose a sequential upper confidence bound based algorithm GPN-UCB along with a general theoretical upper bound on the cumulative regret. In addition, we propose a non-adaptive sampling based method along with its theoretical upper bound on the simple regret for the Mat\'ern kernel. We also provide algorithm-independent lower bounds on the simple regret and cumulative regret. Our regret bounds for GPN-UCB have the same dependence on the time horizon as the best known in the vanilla black-box setting, as well as near-optimal dependencies on other parameters (e.g., RKHS norm and network length).

AAAI Conference 2024 Conference Paper

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

  • Zhouhong Gu
  • Xiaoxuan Zhu
  • Haoning Ye
  • Lin Zhang
  • Jianchen Wang
  • Yixin Zhu
  • Sihang Jiang
  • Zhuozhi Xiong

New Natural Langauge Process~(NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge.Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi-Specialty with 14,041 questions and Xiezhi-Interdiscipline with 10,746 questions. We conduct evaluation of the 47 cutting-edge LLMs on Xiezhi. Results indicate that LLMs exceed average performance of humans in science, engineering, agronomy, medicine, and art, but fall short in economics, jurisprudence, pedagogy, literature, history, and management. All the evaluation code and data are open sourced in https://github.com/MikeGu721/XiezhiBenchmark

AAAI Conference 2023 Conference Paper

CDTA: A Cross-Domain Transfer-Based Attack with Contrastive Learning

  • Zihan Li
  • Weibin Wu
  • Yuxin Su
  • Zibin Zheng
  • Michael R. Lyu

Despite the excellent performance, deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples. Besides, these examples are often transferable among different models. In other words, the same adversarial example can fool multiple models with different architectures at the same time. Based on this property, many black-box transfer-based attack techniques have been developed. However, current transfer-based attacks generally focus on the cross-architecture setting, where the attacker has access to the training data of the target model, which is not guaranteed in realistic situations. In this paper, we design a Cross-Domain Transfer-Based Attack (CDTA), which works in the cross-domain scenario. In this setting, attackers have no information about the target model, such as its architecture and training data. Specifically, we propose a contrastive spectral training method to train a feature extractor on a source domain (e.g., ImageNet) and use it to craft adversarial examples on target domains (e.g., Oxford 102 Flower). Our method corrupts the semantic information of the benign image by scrambling the outputs of both the intermediate feature layers and the final layer of the feature extractor. We evaluate CDTA with 16 target deep models on four datasets with widely varying styles. The results confirm that, in terms of the attack success rate, our approach can consistently outperform the state-of-the-art baselines by an average of 11.45% across all target models. Our code is available at https://github.com/LiulietLee/CDTA.

NeurIPS Conference 2022 Conference Paper

A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

  • Ilija Bogunovic
  • Zihan Li
  • Andreas Krause
  • Jonathan Scarlett

We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as {\em corrupted Gaussian process (GP) bandit optimization}. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, {\em Robust GP Phased Elimination (RGP-PE)}, successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When $T$ is the number of samples and $\gamma_T$ is the maximal information gain, the corruption-dependent term in our regret bound is $O(C \gamma_T^{3/2})$, which is significantly tighter than the existing $O(C \sqrt{T \gamma_T})$ for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.

ICRA Conference 2021 Conference Paper

Real-to-Sim Registration of Deformable Soft Tissue with Position-Based Dynamics for Surgical Robot Autonomy

  • Fei Liu 0033
  • Zihan Li
  • Yunhai Han
  • Jingpei Lu
  • Florian Richter 0002
  • Michael C. Yip

Autonomy in robotic surgery is very challenging in unstructured environments, especially when interacting with deformable soft tissues. The main difficulty is to generate model-based control methods that account for deformation dynamics during tissue manipulation. Previous works in vision-based perception can capture the geometric changes within the scene, however, model-based controllers integrated with dynamic properties, a more accurate and safe approach, has not been studied before. Considering the mechanic coupling between the robot and the environment, it is crucial to develop a registered, simulated dynamical model. In this work, we propose an online, continuous, real-to-sim registration method to bridge 3D visual perception with position-based dynamics (PBD) modeling of tissues. The PBD method is employed to simulate soft tissue dynamics as well as rigid tool interactions for model-based control. Meanwhile, a vision-based strategy is used to generate 3D reconstructed point cloud surfaces based on real-world manipulation, so as to register and update the simulation. To verify this real-to-sim approach, tissue experiments have been conducted on the da Vinci Research Kit. Our real-to-sim approach successfully reduces registration error online, which is especially important for safety during autonomous control. Moreover, it achieves higher accuracy in occluded areas than fusion-based reconstruction.

NeurIPS Conference 2019 Conference Paper

Learning Erdos-Renyi Random Graphs via Edge Detecting Queries

  • Zihan Li
  • Matthias Fresacher
  • Jonathan Scarlett

In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $\Omega( \min\{ k^2 \log n, n^2\})$ tests (even when a small probability of error is allowed), we show that learning an Erd\H{o}s-R\'enyi random graph with an average of $\kbar$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\kbar \log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\kbar \log^2 \kbar + \kbar \log n)$.