Arrow Research search

Author name cluster

Ke Zhu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AIIM Journal 2026 Journal Article

A Character-level Convolutional Recurrent Interaction Network for joint traditional Chinese medicine clinical named entity recognition and relation extraction

  • Qiang Xu
  • Zhi-hui Zhao
  • Wei-wei Liu
  • Yu Fang
  • Wen-jun Tang
  • Yi Zhou
  • Ke Zhu
  • Hai Xiang

The electronic medical record (EMR) of traditional Chinese medicine (TCM) is a crucial document for recording patients’ clinical data, structured around four main dimensions: inspection, listening and smelling, inquiry, and palpation. Analyzing these records using natural language processing holds promise for further structuring and modeling TCM medical data. Currently, deep learning-based named entity recognition is considered the prevailing method for processing TCM EMRs. However, these state-of-the-art models fail to consider the four diagnostic dimensions of TCM clinical data and their impact on entity type extraction, as well as to fully understand the semantic features of ancient Chinese representations in TCM. To address these issues, we introduce a joint clinical named recognition and relation extraction method designed to recognize and classify clinical entities – such as location and symptom attributes – along with their associative relationships (four diagnostic dimensions). In this study, we propose a Character-level Convolutional Recurrent Interaction Network (CCRIN), which treats the four diagnostic dimensions as relationships, locations as head entities, and symptom attributes as tail entities. The CCRIN integrates Chinese character embeddings and Chinese inter-character contextual convolutional feature vectors to capture the semantic information of the ancient Chinese language, while combining entity and relation extraction with a self-attention mechanism to generate rich feature representations through multi-task dynamic interaction. This approach enables the efficient extraction of TCM entities and relations related to the four diagnostic dimensions. Empirical studies on the NYT and the TCM-cases datasets demonstrate the superiority of the proposed model. The model novelly employs a multi-task joint extraction method for entities and relations. The method is performed based on the four diagnostic methods in traditional Chinese medicine. Chinese character embeddings and inter-character contextual feature vectors are integrated. The effectiveness is validated on publicly available and self-constructed datasets.

AAAI Conference 2025 Conference Paper

All You Need in Knowledge Distillation Is a Tailored Coordinate System

  • Junjie Zhou
  • Ke Zhu
  • Jianxin Wu

Knowledge Distillation (KD) is essential in transferring dark knowledge from a large teacher to a small student network, such that the student can be much more efficient than the teacher but with comparable accuracy. Existing KD methods, however, rely on a large teacher trained specifically for the target task, which is both very inflexible and inefficient. In this paper, we argue that a SSL-pretrained model can effectively act as the teacher and its dark knowledge can be captured by the coordinate system or linear subspace where the features lie in. We then need only one forward pass of the teacher, and then tailor the coordinate system (TCS) for the student network. Our TCS method is teacher-free and applies to diverse architectures, works well for KD and practical few-shot learning, allows cross-architecture distillation with large capacity gap. Experiments show that TCS achieves significantly higher accuracy than state-of-the-art KD methods, while only requiring roughly half of their training time and GPU memory costs.

ICML Conference 2025 Conference Paper

AlphaQCM: Alpha Discovery in Finance with Distributional Reinforcement Learning

  • Zhoufan Zhu
  • Ke Zhu

For researchers and practitioners in finance, finding synergistic formulaic alphas is very important but challenging. In this paper, we reconsider the discovery of synergistic formulaic alphas from the viewpoint of sequential decision-making, and conceptualize the entire alpha discovery process as a non-stationary and reward-sparse Markov decision process. To overcome the challenges of non-stationarity and reward-sparsity, we propose the AlphaQCM method, a novel distributional reinforcement learning method designed to search for synergistic formulaic alphas efficiently. The AlphaQCM method first learns the Q function and quantiles via a Q network and a quantile network, respectively. Then, the AlphaQCM method applies the quantiled conditional moment method to learn unbiased variance from the potentially biased quantiles. Guided by the learned Q function and variance, the AlphaQCM method navigates the non-stationarity and reward-sparsity to explore the vast search space of formulaic alphas with high efficacy. Empirical applications to real-world datasets demonstrate that our AlphaQCM method significantly outperforms its competitors, particularly when dealing with large datasets comprising numerous stocks.

ICML Conference 2025 Conference Paper

Doubly Robust Fusion of Many Treatments for Policy Learning

  • Ke Zhu
  • Jianing Chu
  • Ilya Lipkovich
  • Wenyu Ye
  • Shu Yang

Individualized treatment rules/recommendations (ITRs) aim to improve patient outcomes by tailoring treatments to the characteristics of each individual. However, in high-dimensional treatment settings, existing methods face significant challenges due to data sparsity within treatment groups and highly unbalanced covariate distributions across groups. To address these challenges, we propose a novel calibration-weighted treatment fusion procedure that robustly balances covariates across treatment groups and fuses similar treatments using a penalized working model. The fusion procedure ensures the recovery of latent treatment group structures when either the calibration model or the outcome model is correctly specified. In the fused treatment space, practitioners can seamlessly apply state-of-the-art ITR learning methods with the flexibility to utilize a subset of covariates, thereby achieving robustness while addressing practical concerns such as fairness. We establish theoretical guarantees, including consistency, the oracle property of treatment fusion, and regret bounds when integrated with multi-armed ITR learning methods such as policy trees. Simulation studies show superior group recovery and policy value compared to existing approaches. We illustrate the practical utility of our method using EHR-derived data from patients with Chronic Lymphocytic Leukemia and Small Lymphocytic Lymphoma.

ICML Conference 2025 Conference Paper

Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing

  • Ke Zhu
  • Shu Yang
  • Xiaofei Wang

External controls from historical trials or observational data can augment randomized controlled trials when large-scale randomization is impractical or unethical, such as in drug evaluation for rare diseases. However, non-randomized external controls can introduce biases, and existing Bayesian and frequentist methods may inflate the type I error rate, particularly in small-sample trials where external data borrowing is most critical. To address these challenges, we propose a randomization inference framework that ensures finite-sample exact and model-free type I error rate control, adhering to the “analyze as you randomize” principle to safeguard against hidden biases. Recognizing that biased external controls reduce the power of randomization tests, we leverage conformal inference to develop an individualized test-then-pool procedure that selectively borrows comparable external controls to improve power. Our approach incorporates selection uncertainty into randomization tests, providing valid post-selection inference. Additionally, we propose an adaptive procedure to optimize the selection threshold by minimizing the mean squared error across a class of estimators encompassing both no-borrowing and full-borrowing approaches. The proposed methods are supported by non-asymptotic theoretical analysis, validated through simulations, and applied to a randomized lung cancer trial that integrates external controls from the National Cancer Database.

NeurIPS Conference 2024 Conference Paper

DiffuLT: Diffusion for Long-tail Recognition Without External Knowledge

  • Jie Shao
  • Ke Zhu
  • Hanxiao Zhang
  • Jianxin Wu

This paper introduces a novel pipeline for long-tail (LT) recognition that diverges from conventional strategies. Instead, it leverages the long-tailed dataset itself to generate a balanced proxy dataset without utilizing external data or model. We deploy a diffusion model trained from scratch on only the long-tailed dataset to create this proxy and verify the effectiveness of the data produced. Our analysis identifies approximately-in-distribution (AID) samples, which slightly deviate from the real data distribution and incorporate a blend of class information, as the crucial samples for enhancing the generative model's performance in long-tail classification. We promote the generation of AID samples during the training of a generative model by utilizing a feature extractor to guide the process and filter out detrimental samples during generation. Our approach, termed Diffusion model for Long-Tail recognition (DiffuLT), represents a pioneer application of generative models in long-tail recognition. DiffuLT achieves state-of-the-art results on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT, surpassing leading competitors by significant margins. Comprehensive ablations enhance the interpretability of our pipeline. Notably, the entire generative process is conducted without relying on external data or pre-trained model weights, which leads to its generalizability to real-world long-tailed scenarios.

AAAI Conference 2024 Conference Paper

DTL: Disentangled Transfer Learning for Visual Recognition

  • Minghao Fu
  • Ke Zhu
  • Jianxin Wu

When pre-trained models become rapidly larger, the cost of fine-tuning on downstream tasks steadily increases, too. To economically fine-tune these models, parameter-efficient transfer learning (PETL) is proposed, which only tunes a tiny subset of trainable parameters to efficiently learn quality representations. However, current PETL methods are facing the dilemma that during training the GPU memory footprint is not effectively reduced as trainable parameters. PETL will likely fail, too, if the full fine-tuning encounters the out-of-GPU-memory issue. This phenomenon happens because trainable parameters from these methods are generally entangled with the backbone, such that a lot of intermediate states have to be stored in GPU memory for gradient propagation. To alleviate this problem, we introduce Disentangled Transfer Learning (DTL), which disentangles the trainable parameters from the backbone using a lightweight Compact Side Network (CSN). By progressively extracting task-specific information with a few low-rank linear mappings and appropriately adding the information back to the backbone, CSN effectively realizes knowledge transfer in various downstream tasks. We conducted extensive experiments to validate the effectiveness of our method. The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy, achieving new state-of-the-art on several standard benchmarks.

AAAI Conference 2023 Conference Paper

Quantized Feature Distillation for Network Quantization

  • Ke Zhu
  • Yin-Yin He
  • Jianxin Wu

Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in real world deployment. To the best of our knowledge, this is the first time that vision transformers have been quantized in object detection and image segmentation tasks.