Arrow Research search

Author name cluster

Zhe Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
2 author rows

Possible papers

31

AAAI Conference 2026 Conference Paper

Subgraph Encoding with Bicentric Sphere Node Labeling and Pooling for Link Prediction

  • Zhihong Fang
  • Shaolin Tan
  • Qiu Fang
  • Zhe Li
  • Qing Gao

Learning representation of the enclosing subgraph of node pairs is recognized as an efficient approach for link-oriented prediction tasks in network applications. The core challenge within this subgraph encoding approach is how to effectively distinguish and then properly aggregate the contribution of nodes in the subgraph into a single vector to indicate the relation between the target node pair. In this work, we propose a novel sphere-based subgraph encoding architecture, namely BS-SubGNN, to address the challenge. In detail, we design two key building blocks, including Bicentric Sphere Node Labeling (BSNL) and Bicentric Sphere Subgraph Pooling (BSSP) to assist message passing in BS-SubGNN. BSNL endows each node a label according to the sphere it belongs to in the subgraph to distinguish the contribution of nodes, while BSSP adopts an attention mechanism to aggregate the contribution of nodes in each sphere. Theoretically, we prove that BS-SubGNN can unify existing node distance labeling methods, and yield discriminative node features with less time complexity. We evaluate the performance of BS-SubGNN in link prediction tasks over a variety of network types, including undirected networks, attribute networks, directed networks, and signed directed networks. Our experimental results demonstrate that BS-SubGNN consistently achieves significant performance improvements over the above diverse types of networks. In particular, compared to those methods with a requisite of multi-hop neighborhood information, BS-SubGNN can obtain better performance even when only one-hop neighborhood information of the node pair is utilized.

EAAI Journal 2025 Journal Article

A Federated Fairness-Aware Incentive Mechanism for medical image classification

  • Chunling Chen
  • Haiwei Pan
  • Kejia Zhang
  • Zhe Li
  • Fengming Yu

Despite the promising progress in federated learning (FL), two major challenges persist when applying FL in real-world medical scenarios: data heterogeneity and class imbalance. Previous efforts to tackle these challenges have primarily focused on local training or global aggregation, often assuming that all clients are willing participants. However, this assumption may not hold in real medical scenarios, where clients act based on rational self-interest. In this paper, we propose a Federated Fairness-Aware Incentive Mechanism (FedFIM) to tackle these problems. FedFIM consists of two components: Server-driven Marginal Incentive (SMI) and Prototype-based Shapley Estimation (PSE). Specifically, SMI ensures the data quality of participants to mitigate the performance degradation caused by data heterogeneity, while PSE leverages prototypes for more fine-grained contribution evaluations. Furthermore, based on these contribution evaluations, a prototype-level aggregation scheme is designed to alleviate the impact of class imbalance. Extensive experiments on five medical image classification tasks demonstrate that FedFIM achieves superior accuracy compared to seven popular benchmark methods, particularly in complex multi-classification scenarios.

ICLR Conference 2025 Conference Paper

BodyGen: Advancing Towards Efficient Embodiment Co-Design

  • Haofei Lu
  • Zhe Wu
  • Junliang Xing
  • Jianshu Li
  • Ruoyu Li
  • Zhe Li
  • Yuanchun Shi

Embodiment co-design aims to optimize a robot's morphology and control policy simultaneously. While prior work has demonstrated its potential for generating environment-adaptive robots, this field still faces persistent challenges in optimization efficiency due to the (i) combinatorial nature of morphological search spaces and (ii) intricate dependencies between morphology and control. We prove that the ineffective morphology representation and unbalanced reward signals between the design and control stages are key obstacles to efficiency. To advance towards efficient embodiment co-design, we propose **BodyGen**, which utilizes (1) topology-aware self-attention for both design and control, enabling efficient morphology representation with lightweight model sizes; (2) a temporal credit assignment mechanism that ensures balanced reward signals for optimization. With our findings, BodyGen achieves an average **60.03%** performance improvement against state-of-the-art baselines. We provide codes and more results on the website: https://genesisorigin.github.io.

AAAI Conference 2025 Conference Paper

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

  • Zheng Hu
  • Zhe Li
  • Ziyun Jiao
  • Satoshi Nakagawa
  • Jiawen Deng
  • Shimin Cai
  • Tao Zhou
  • Fuji Ren

In recent years, knowledge graphs have been integrated into recommender systems as item-side auxiliary information, enhancing recommendation accuracy. However, constructing and integrating structural user-side knowledge remains a significant challenge due to the improper granularity and inherent scarcity of user-side features. Recent advancements in Large Language Models (LLMs) offer the potential to bridge this gap by leveraging their human behavior understanding and extensive real-world knowledge. Nevertheless, integrating LLM-generated information into recommender systems presents challenges, including the risk of noisy information and the need for additional knowledge transfer. In this paper, we propose an LLM-based user-side knowledge inference method alongside a carefully designed recommendation framework to address these challenges. Our approach employs LLMs to infer user interests based on historical behaviors, integrating this user-side information with item-side and collaborative data to construct a hybrid structure: the Collaborative Interest Knowledge Graph (CIKG). Furthermore, we propose a CIKG-based recommendation framework that includes a user interest reconstruction module and a cross-domain contrastive learning module to mitigate potential noise and facilitate knowledge transfer. We conduct extensive experiments on three real-world datasets to validate the effectiveness of our method. Our approach achieves state-of-the-art performance compared to competitive baselines, particularly for users with sparse interactions.

NeurIPS Conference 2025 Conference Paper

Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via $\textit{In-the-wild}$ Cascading Flow Optimization

  • Yixiao Chen
  • Shikun Sun
  • Jianshu Li
  • Ruoyu Li
  • Zhe Li
  • Junliang Xing

Adversarial attacks are widely used to evaluate model robustness, and in black-box scenarios, the transferability of these attacks becomes crucial. Existing generator-based attacks have excellent generalization and transferability due to their instance-agnostic nature. However, when training generators for multi-target tasks, the success rate of transfer attacks is relatively low due to the limitations of the model's capacity. To address these challenges, we propose a novel Dual-Flow framework for multi-target instance-agnostic adversarial attacks, utilizing Cascading Distribution Shift Training to develop an adversarial velocity function. Extensive experiments demonstrate that Dual-Flow significantly improves transferability over previous multi-target generative attacks. For example, it increases the success rate from Inception-v3 to ResNet-152 by 34. 58%. Furthermore, our attack method shows substantially stronger robustness against defense mechanisms, such as adversarially trained models.

NeurIPS Conference 2025 Conference Paper

Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable

  • Bicheng Ying
  • Zhe Li
  • Haibo Yang

This work tackles the fundamental challenges in Federated Learning (FL) posed by arbitrary client participation and data heterogeneity, prevalent characteristics in practical FL settings. It is well-established that popular FedAvg-style algorithms struggle with exact convergence and can suffer from slow convergence rates since a decaying learning rate is required to mitigate these scenarios. To address these issues, we introduce the concept of stochastic matrix and the corresponding time-varying graphs as a novel modeling tool to accurately capture the dynamics of arbitrary client participation and the local update procedure. Leveraging this approach, we offer a fresh perspective on designing FL algorithms, provide a rigorous quantitative analysis of the limitations inherent in the FedAvg algorithm, and present FOCUS, Federated Optimization with Exact Convergence via Push-pull Strategy, a provably convergent algorithm designed to effectively overcome the previously mentioned two challenges. More specifically, we provide a rigorous proof demonstrating that FOCUS achieves exact convergence with a linear rate regardless of the arbitrary client participation, establishing it as the first work to demonstrate this significant result.

IJCAI Conference 2025 Conference Paper

FAST: A Lightweight Mechanism Unleashing Arbitrary Client Participation in Federated Learning

  • Zhe Li
  • Seyedsina Nabavirazavi
  • Bicheng Ying
  • Sitharama Iyengar
  • Haibo Yang

Federated Learning (FL) provides a flexible distributed platform where numerous clients with high data and system heterogeneity can collaborate to learn a model. While previous research has shown that FL can handle diverse data, it often completely assumes idealized conditions. In practice, real-world factors make it hard to predict or design individual client participation. This complexity results in an unknown participation pattern - arbitrary client participation (ACP). Hence, the key open problem is to understand the impact of client participation and develop a lightweight mechanism to support ACP in FL. In this paper, we first empirically investigate the client participation's influence in FL, revealing that FL algorithms are adversely impacted by ACP. To alleviate the impact, we propose a lightweight solution, Federated Average with Snapshot (FAST), that supports almost ACP for FL and can seamlessly integrate with other classic FL algorithms. Specifically, FAST enforces clients to take a snapshot once in a while and facilitates ACP for the majority of training processes. We prove that the convergence rates of FAST in non-convex and strongly-convex cases match those under ideal client participation. Furthermore, we empirically introduce an adaptive strategy to dynamically configure the snapshot frequency, tailored to accommodate diverse FL systems. Extensive experiments show that FAST significantly improves performance under ACP and high data heterogeneity.

ICML Conference 2025 Conference Paper

HiRemate: Hierarchical Approach for Efficient Re-materialization of Neural Networks

  • Julia Gusak
  • Xunyi Zhao
  • Théotime Le Hellard
  • Zhe Li
  • Lionel Eyraud-Dubois
  • Olivier Beaumont

Training deep neural networks (DNNs) on memory-limited GPUs is challenging, as storing intermediate activations often exceeds available memory. Re-materialization, a technique that preserves exact computations, addresses this by selectively recomputing activations instead of storing them. However, existing methods either fail to scale, lack generality, or introduce excessive execution overhead. We introduce ${\mbox{HiRemate}}$ a ${\textit hierarchical}$ re-materialization framework that recursively partitions large computation graphs, applies optimized solvers at multiple levels, and merges solutions into a global efficient training schedule. This enables scalability to significantly larger graphs than prior ILP-based methods while keeping runtime overhead low. Designed for single-GPU models and activation re-materialization, HiRemate extends the feasibility of training networks with thousands of graph nodes, surpassing prior methods in both efficiency and scalability. Experiments on various types of networks yield up to 50-70% memory reduction with only 10-15% overhead, closely matching optimal solutions while significantly reducing solver time. Seamlessly integrating with PyTorch Autograd, HiRemate requires almost no code change to use, enabling broad adoption in memory-constrained deep learning.

JBHI Journal 2025 Journal Article

PathBot: A Foundation Model for Pathological Image Analysis

  • Mengkang Lu
  • Tianyi Wang
  • Qingjie Zeng
  • Zilin Lu
  • Zhe Li
  • Yong Xia

Abstract Computational pathology has emerged as a transformative paradigm by leveraging artificial intelligence to automate and enhance diagnostic procedures. However, existing models often target narrow tasks or specific tumor types, missing opportunities to unify diverse datasets and tasks through joint learning. In this work, we introduce PathBot, a foundation model tailored for comprehensive pathological image analysis. Central to PathBot is a ViT-Giant encoder with one billion parameters, the largest model to date trained on publicly available pathological data. We pre-train this encoder using a novel Masked Distillation Network (MDN) and an integrated learning strategy that combines contrastive and generative objectives. The pre-training leverages over 30 million image patches derived from 11, 765 whole slide images (WSIs) across 32 cancer types in the Cancer Genome Atlas (TCGA). To evaluate its versatility, we pair the encoder with task-specific decoders for segmentation, detection, classification, and regression. Extensive experiments across 20 downstream tasks demonstrate that PathBot achieves state-of-the-art performance in most cases, showcasing its robustness and generalizability.

ICRA Conference 2025 Conference Paper

Plug-and-Play Multi-Domain Fusion Adaptation for Cross-Subject EEG-Based Motor Imagery Classification

  • Kecheng Shi
  • Rui Huang 0008
  • Zhe Li
  • Jianzhi Lyu
  • Yang Zhao 0024
  • Guangkui Song
  • Hong Cheng 0002
  • Jianwei Zhang 0001

Motor imagery (MI) classification in rehabilitation brain-computer interfaces (RBCIs) faces significant challenges due to the variability of electroencephalography (EEG) signals across subjects. Existing methods typically require extensive EEG data collection from each new subject, which is time-consuming and results in poor user experience. To address this issue, this paper decompose MI-EEG into subject-specific private components and shared components common across all subjects, and propose a plug-and-play domain fusion adaptive method (PPMDFA) to handle variability between subjects. In the training phase, PPMDFA introduces a Multi-Domain Fusion Graph Convolutional Network (MDFGCN) module to extract shared and private features from the MI processes of source domain subjects. In the calibration phase, the method constructs private classifiers for the target new subject using the extracted shared features combined with a small amount of labeled data. During testing, PPMDFA leverages the similarity of private components to utilize knowledge from source subjects, thereby enhancing classification accuracy for target subjects' MI. We validated the proposed method on the PhysioNet and LLMBCImotion datasets. Experimental results show that PPMDFA achieves state-of-the-art classification accuracy on both datasets, with rapid adaptation to new subjects using only 20% of the data, reaching accuracies of 73. 33% and 61. 62%, demonstrating strong generalization ability and robustness.

NeurIPS Conference 2025 Conference Paper

Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach

  • Dandan Liang
  • Jianing Zhang
  • Evan Chen
  • Zhe Li
  • Rui Li
  • Haibo Yang

Split Federated Learning (SFL) enables scalable training on edge devices by combining the parallelism of Federated Learning (FL) with the computational offloading of Split Learning (SL). Despite its great success, SFL suffers significantly from the well-known straggler issue in distributed learning systems. This problem is exacerbated by the dependency between Split Server and clients: the Split Server side model update relies on receiving activations from clients. Such synchronization requirement introduces significant time latency, making straggler a critical bottleneck to the scalability and efficiency of the system. To mitigate this problem, we propose *MU-SplitFed*, a straggler-resilient SFL algorithm that decouples training progress from straggler delays via a simple yet effective unbalanced update mechanism. By enabling the server to perform $\tau$ local updates per client round, *MU-SplitFed* achieves convergence rate $\mathcal{O}(\sqrt{d/(\tau T)})$, showing a linear reduction in communication round by a factor of $\tau$. Experiments demonstrate that *MU-SplitFed* consistently outperforms baseline methods with the presence of stragglers and effectively mitigates their impact through adaptive tuning of $\tau$.

NeurIPS Conference 2025 Conference Paper

URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model

  • Zhe Li
  • Xiang Bai
  • Jieyu Zhang
  • Zhuangzhe Wu
  • Che Xu
  • Ying Li
  • Chengkai Hou
  • Shanghang Zhang

Constructing accurate digital twins of articulated objects is essential for robotic simulation training and embodied AI world model building, yet historically requires painstaking manual modeling or multi-stage pipelines. In this work, we propose \textbf{URDF-Anything}, an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM). URDF-Anything utilizes an autoregressive prediction framework based on point-cloud and text multimodal input to jointly optimize geometric segmentation and kinematic parameter prediction. It implements a specialized [SEG] token mechanism that interacts directly with point cloud features, enabling fine-grained part-level segmentation while maintaining consistency with the kinematic parameter predictions. Experiments on both simulated and real-world datasets demonstrate that our method significantly outperforms existing approaches regarding geometric segmentation (mIoU 17\% improvement), kinematic parameter prediction (average error reduction of 29\%), and physical executability (surpassing baselines by 50\%). Notably, our method exhibits excellent generalization ability, performing well even on objects outside the training set. This work provides an efficient solution for constructing digital twins for robotic simulation, significantly enhancing the sim-to-real transfer capability.

EAAI Journal 2024 Journal Article

A multilevel interleaved group attention-based convolutional network for gas detection via an electronic nose system

  • Shichao Zhai
  • Zhe Li
  • Huisheng Zhang
  • Lidan Wang
  • Shukai Duan
  • Jia Yan

In this paper, an E-nose system for industrial exhaust detection is established and a novel deep learning model called the multilevel interleaved group attention-based convolutional network (MIGACN) is proposed for processing sensor array signals to identify 10 kinds of industrial pollution gases. First, a homemade E-nose system consisting of 15 gas sensors was constructed to acquire industrial pollution gas samples. Second, for sensor array signal processing, two novel feature learning modules are proposed at both the temporal level and the sensor level based on the actual physical significance of the sensor signals to enable the network to automatically extract intrinsic features of the sensor response signals. Third, we introduce a data augmentation module to avoid the problem of insufficient model training due to limited data volume and achieve dynamic gas detection through sliding windows. The proposed MIGACN directly uses the original response of the sensors as the input to automatically extract the intrinsic signal features without tedious manual empirical feature extraction. The experimental results show that the MIGACN achieves better classification performance and excellent stability than do other advanced deep learning methods. MIGACN obtains 90. 76% classification accuracy without the data augmentation module. With the data augmentation module, the MIGACN obtains a testing accuracy of 98. 06% from the early sensor response stage. In addition, the MIGACN achieves the highest average dynamic detection accuracy of 98. 19% with the data augmentation module, which highlights its advantages in practical applications.

EAAI Journal 2024 Journal Article

A survey of deep learning-driven architecture for predictive maintenance

  • Zhe Li
  • Qian He
  • Jingyue Li

Over the past decades, deep learning techniques have attracted increased attention from various research and industrial domains aligned with the development of Industry Internet-of-Things(IIoT). Specifically, with the advantage of data-driven methods, industrial organizations are seeking novel proactive strategies supported by analytic models to guarantee the quality of their production by observing degradation or predicting failure ahead of the occurrence of the component or asset. Predictive strategies are expected to promise the influence of unnecessary maintenance interruptions and mitigate the consequence of that, hence, extending the remaining useful life of products. This paper conducts a survey of the utilization of deep learning technologies on engineering applications where they provide satisfactory solutions with respect to specific data types or input signals. 106 primary papers are reviewed on deep learning–driven approaches which mainly explore five of the most popular architectures in the application of predictive maintenance. The main content of this paper summarizes the common advantages of each architecture and, accordingly, points out their limitations, as well as describes the application scopes of fully connected deep neural networks, convolutional neural networks, stacked autoencoders, deep belief networks, and deep recurrent neural networks. Based on the technique discussion for each of them, we intend to provide a comprehensive understanding and guidance of the appropriate usage of deep learning architectures to devise an effective predictive maintenance strategy for the scientific and industrial developers whose expertise lies in the prior domain knowledge of multi-source isomerization data. Moreover, the main content demonstrated the summarization of the decisive factor by which the incremental stages of the approaches were determined, fundamentally including the dataset specification, feature extraction, and the integration of deep learning approaches.

IJCAI Conference 2024 Conference Paper

Efficiency Calibration of Implicit Regularization in Deep Networks via Self-paced Curriculum-Driven Singular Value Selection

  • Zhe Li
  • Shuo Chen
  • Jian Yang
  • Lei Luo

The generalization of neural networks has been a major focus of research in deep learning. It is often interpreted as an implicit bias towards solutions with specific properties. Especially, in practical applications, it has been observed that linear neural networks (LNN) tend to favor low-rank solutions for matrix completion tasks. However, most existing methods rely on increasing the depth of the neural network to enhance the low rank of solutions, resulting in higher complexity. In this paper, we propose a new explicit regularization method that calibrates the implicit bias towards low-rank trends in matrix completion tasks. Our approach automatically incorporates smaller singular values into the training process using a self-paced learning strategy, gradually restoring matrix information. By jointly using both implicit and explicit regularization, we effectively capture the low-rank structure of LNN and accelerate its convergence. We also analyze how our proposed penalty term interacts with implicit regularization and provide theoretical guarantees for our new model. To evaluate the effectiveness of our method, we conduct a series of experiments on both simulated and real-world data. Our experimental results clearly demonstrate that our method has better robustness and generalization ability compared with other methods.

IJCAI Conference 2023 Conference Paper

A Bitwise GAC Algorithm for Alldifferent Constraints

  • Zhe Li
  • Yaohua Wang
  • Zhanshan Li

The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.

IJCAI Conference 2023 Conference Paper

SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting

  • Yiduo Li
  • Shiyi Qi
  • Zhe Li
  • Zhongwen Rao
  • Lujia Pan
  • Zenglin Xu

The success of Transformers in long time series forecasting (LTSF) can be attributed to their attention mechanisms and non-autoregressive (NAR) decoder structures, which capture long-range de- pendencies. However, time series data also contain abundant local temporal dependencies, which are often overlooked in the literature and significantly hinder forecasting performance. To address this issue, we introduce SMARTformer, which stands for SeMi-AutoRegressive Transformer. SMARTformer utilizes the Integrated Window Attention (IWA) and Semi-AutoRegressive (SAR) Decoder to capture global and local dependencies from both encoder and decoder perspectives. IWA conducts local self-attention in multi-scale windows and global attention across windows with linear com- plexity to achieve complementary clues in local and enlarged receptive fields. SAR generates subsequences iteratively, similar to autoregressive (AR) decoding, but refines the entire sequence in a NAR manner. This way, SAR benefits from both the global horizon of NAR and the local detail capturing of AR. We also introduce the Time-Independent Embedding (TIE), which better captures local dependencies by avoiding entanglements of various periods that can occur when directly adding po- sitional embedding to value embedding. Our ex- tensive experiments on five benchmark datasets demonstrate the effectiveness of SMARTformer against state-of-the-art models, achieving an improvement of 10. 2% and 18. 4% in multivariate and univariate long-term forecasting, respectively.

TIST Journal 2022 Journal Article

CrimeTensor: Fine-Scale Crime Prediction via Tensor Learning with Spatiotemporal Consistency

  • Weichao Liang
  • Zhiang Wu
  • Zhe Li
  • Yong Ge

Crime poses a major threat to human life and property, which has been recognized as one of the most crucial problems in our society. Predicting the number of crime incidents in each region of a city before they happen is of great importance to fight against crime. There has been a great deal of research focused on crime prediction, ranging from introducing diversified data sources to exploring various prediction models. However, most of the existing approaches fail to offer fine-scale prediction results and take little notice of the intricate spatial-temporal-categorical correlations contained in crime incidents. In this article, we propose a tailor-made framework called CrimeTensor to predict the number of crime incidents belonging to different categories within each target region via tensor learning with spatiotemporal consistency. In particular, we model the crime data as a tensor and present an objective function which tries to take full advantage of the spatial, temporal, and categorical correlations contained in crime incidents. Moreover, a well-designed optimization algorithm which transforms the objective into a compact form and then applies CP decomposition to find the optimal solution is elaborated to solve the objective function. Furthermore, we develop an enhanced framework which takes a set of pre-selected regions to conduct prediction so as to further improve the computational efficiency of the optimization algorithm. Finally, extensive experiments are performed on both proprietary and public datasets and our framework significantly outperforms all the baselines in terms of each evaluation metric.

YNICL Journal 2022 Journal Article

Predicting prognosis of primary pontine hemorrhage using CT image and deep learning

  • Shuo Wang
  • Feng Chen
  • Mingyu Zhang
  • Xiaolin Zhao
  • Linghua Wen
  • Wenyuan Wu
  • Shina Wu
  • Zhe Li

Prognosis of primary pontine hemorrhage (PPH) is important for treatment planning and patient management. However, only few clinical factors were reported to have prognostic value to PPH. Here, we propose a deep learning (DL) model that mines high-dimensional prognostic information from computed tomography (CT) images and combines clinical factors for predicting individualized prognosis of PPH. We proposed a multi-task DL model to learn high-dimensional CT features of hematoma and perihematomal areas for predicting the risk of 30-day mortality, 90-day mortality and 90-day functional outcome of PPH simultaneously. We further explored the combination of the DL model and clinical factors by building a combined model. All the models were trained in a training cohort (n = 219) and tested in an independent testing cohort (n = 35). The DL model achieved area under the curve (AUC) of 0.886, 0.886, and 0.759 in predicting 30-day mortality, 90-day mortality and 90-day functional outcome of PPH in the independent testing cohort, which improved over the previously reported new PPH score and the clinical model. When combining the DL model and clinical factors, the combined model achieved improved performance (AUC = 0.920, 0.941, and 0.894), indicating that DL model mines CT information that complements clinical factors. Through DL visualization technique, we found that the internal structure of hematoma and its expansion to perihematomal regions are important for predicting the prognosis of PPH. This DL model provides an easy-to-use way for predicting individualized prognosis of PPH by mining high-dimensional information from CT images, and showed improvement over clinical factors and present methods.

JBHI Journal 2021 Journal Article

Deep Reinforcement Learning for Weakly-Supervised Lymph Node Segmentation in CT Images

  • Zhe Li
  • Yong Xia

Accurate and automated lymph node segmentation is pivotal for quantitatively accessing disease progression and potential therapeutics. The complex variation of lymph node morphology and the difficulty of acquiring voxel-wise manual annotations make lymph node segmentation a challenging task. Since the Response Evaluation Criteria in Solid Tumors (RECIST) annotation, which indicates the location, length, and width of a lymph node, is commonly available in hospital data archives, we advocate to use RECIST annotations as the supervision, and thus formulate this segmentation task into a weakly-supervised learning problem. In this paper, we propose a deep reinforcement learning-based lymph node segmentation (DRL-LNS) model. Based on RECIST annotations, we segment RECIST-slices in an unsupervised way to produce pseudo ground truths, which are then used to train U-Net as a segmentation network. Next, we train a DRL model, in which the segmentation network interacts with the policy network to optimize the lymph node bounding boxes and segmentation results simultaneously. The proposed DRL-LNS model was evaluated against three widely used image segmentation networks on a public thoracoabdominal Computed Tomography (CT) dataset that contains 984 3D lymph nodes, and achieves the mean Dice similarity coefficient (DSC) of 77. 17% and the mean Intersection over Union (IoU) of 64. 78% in the four-fold cross-validation. Our results suggest that the DRL-based bounding box prediction strategy outperforms the label propagation strategy and the proposed DRL-LNS model is able to achieve the state-of-the-art performance on this weakly-supervised lymph node segmentation task.

AAAI Conference 2020 Conference Paper

Long Short-Term Sample Distillation

  • Liang Jiang
  • Zujie Wen
  • Zhongping Liang
  • Yafang Wang
  • Gerard de Melo
  • Zhe Li
  • Liangzhuang Ma
  • Jiaxing Zhang

In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher–student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short- Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher– student differences, while the short-term one yields more upto-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.

NeurIPS Conference 2019 Conference Paper

Learning from brains how to regularize machines

  • Zhe Li
  • Wieland Brendel
  • Edgar Walker
  • Erick Cobos
  • Taliah Muhammad
  • Jacob Reimer
  • Matthias Bethge
  • Fabian Sinz

Despite impressive performance on numerous visual tasks, Convolutional Neural Networks (CNNs) --- unlike brains --- are often highly sensitive to small perturbations of their input, e. g. adversarial noise leading to erroneous decisions. We propose to regularize CNNs using large-scale neuroscience data to learn more robust neural features in terms of representational similarity. We presented natural images to mice and measured the responses of thousands of neurons from cortical visual areas. Next, we denoised the notoriously variable neural activity using strong predictive models trained on this large corpus of responses from the mouse visual system, and calculated the representational similarity for millions of pairs of images from the model's predictions. We then used the neural representation similarity to regularize CNNs trained on image classification by penalizing intermediate representations that deviated from neural ones. This preserved performance of baseline models when classifying images under standard benchmarks, while maintaining substantially higher performance compared to baseline or control models when classifying noisy images. Moreover, the models regularized with cortical representations also improved model robustness in terms of adversarial attacks. This demonstrates that regularizing with neural data can be an effective tool to create an inductive bias towards more robust inference.

IJCAI Conference 2019 Conference Paper

Reading selectively via Binary Input Gated Recurrent Unit

  • Zhe Li
  • Peisong Wang
  • Hanqing Lu
  • Jian Cheng

Recurrent Neural Networks (RNNs) have shown great promise in sequence modeling tasks. Gated Recurrent Unit (GRU) is one of the most used recurrent structures, which makes a good trade-off between performance and time spent. However, its practical implementation based on soft gates only partially achieves the goal to control information flow. We can hardly explain what the network has learnt internally. Inspired by human reading, we introduce binary input gated recurrent unit (BIGRU), a GRU based model using a binary input gate instead of the reset gate in GRU. By doing so, our model can read selectively during interference. In our experiments, we show that BIGRU mainly ignores the conjunctions, adverbs and articles that do not make a big difference to the document understanding, which is meaningful for us to further understand how the network works. In addition, due to reduced interference from redundant information, our model achieves better performances than baseline GRU in all the testing tasks.

IJCAI Conference 2018 Conference Paper

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

  • Yan Yan
  • Tianbao Yang
  • Zhe Li
  • Qihang Lin
  • Yi Yang

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i. e. , the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov? s accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

NeurIPS Conference 2018 Conference Paper

Adaptive Negative Curvature Descent with Applications in Non-convex Optimization

  • Mingrui Liu
  • Zhe Li
  • Xiaoyu Wang
  • Jinfeng Yi
  • Tianbao Yang

Negative curvature descent (NCD) method has been utilized to design deterministic or stochastic algorithms for non-convex optimization aiming at finding second-order stationary points or local minima. In existing studies, NCD needs to approximate the smallest eigen-value of the Hessian matrix with a sufficient precision (e. g. , $\epsilon_2\ll 1$) in order to achieve a sufficiently accurate second-order stationary solution (i. e. , $\lambda_{\min}(\nabla^2 f(\x))\geq -\epsilon_2)$. One issue with this approach is that the target precision $\epsilon_2$ is usually set to be very small in order to find a high quality solution, which increases the complexity for computing a negative curvature. To address this issue, we propose an adaptive NCD to allow for an adaptive error dependent on the current gradient's magnitude in approximating the smallest eigen-value of the Hessian, and to encourage competition between a noisy NCD step and gradient descent step. We consider the applications of the proposed adaptive NCD for both deterministic and stochastic non-convex optimization, and demonstrate that it can help reduce the the overall complexity in computing the negative curvatures during the course of optimization without sacrificing the iteration complexity.

AAAI Conference 2018 Conference Paper

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

  • Yanzhi Wang
  • Caiwen Ding
  • Zhe Li
  • Geng Yuan
  • Siyu Liao
  • Xiaolong Ma
  • Bo Yuan
  • Xuehai Qian

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff of accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n2 ) to O(n log n) and storage complexity from O(n2 ) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.

AAAI Conference 2017 Conference Paper

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis

  • Zhe Li
  • Tianbao Yang
  • Lijun Zhang
  • Rong Jin

This paper aims to provide a sharp excess risk guarantee for learning a sparse linear model without any assumptions about the strong convexity of the expected loss and the sparsity of the optimal solution in hindsight. Given a target level _ for the excess risk, an interesting question to ask is how many examples and how large the support set of the solution are enough for learning a good model with the target excess risk. To answer these questions, we present a two-stage algorithm that (i) in the first stage an epoch based stochastic optimization algorithm is exploited with an established O(1/_) bound on the sample complexity; and (ii) in the second stage a distribution dependent randomized sparsification is presented with an O(1/_) bound on the sparsity (referred to as support complexity) of the resulting model. Compared to previous works, our contributions lie at (i) we reduce the order of the sample complexity from O(1/_2) to O(1/_) without the strong convexity assumption; and (ii) we reduce the constant in O(1/_) for the sparsity by exploring the distribution dependent sampling.

IJCAI Conference 2017 Conference Paper

SVD-free Convex-Concave Approaches for Nuclear Norm Regularization

  • Yichi Xiao
  • Zhe Li
  • Tianbao Yang
  • Lijun Zhang

Minimizing a convex function of matrices regularized by the nuclear norm arises in many applications such as collaborative filtering and multi-task learning. In this paper, we study the general setting where the convex function could be non-smooth. When the size of the data matrix, denoted by m x n, is very large, existing optimization methods are inefficient because in each iteration, they need to perform a singular value decomposition (SVD) which takes O(m^2 n) time. To reduce the computation cost, we exploit the dual characterization of the nuclear norm to introduce a convex-concave optimization problem and design a subgradient-based algorithm without performing SVD. In each iteration, the proposed algorithm only computes the largest singular vector, reducing the time complexity from O(m^2 n) to O(mn). To the best of our knowledge, this is the first SVD-free convex optimization approach for nuclear-norm regularized problems that does not rely on the smoothness assumption. Theoretical analysis shows that the proposed algorithm converges at an optimal O(1/\sqrt{T}) rate where T is the number of iterations. We also extend our algorithm to the stochastic case where only stochastic subgradients of the convex function are available and a special case that contains an additional non-smooth regularizer (e. g. , L1 norm regularizer). We conduct experiments on robust low-rank matrix approximation and link prediction to demonstrate the efficiency of our algorithms.

NeurIPS Conference 2016 Conference Paper

Improved Dropout for Shallow and Deep Learning

  • Zhe Li
  • Boqing Gong
  • Tianbao Yang

Dropout has been witnessed with great success in training deep neural networks by independently zeroing out the outputs of neurons at random. It has also received a surge of interest for shallow learning, e. g. , logistic regression. However, the independent sampling for dropout could be suboptimal for the sake of convergence. In this paper, we propose to use multinomial sampling for dropout, i. e. , sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named \textbf{evolutional dropout}) that computes the sampling probabilities on-the-fly from a mini-batch of examples. Empirical studies on several benchmark datasets demonstrate that the proposed dropouts achieve not only much faster convergence and but also a smaller testing error than the standard dropout. For example, on the CIFAR-100 data, the evolutional dropout achieves relative improvements over 10\% on the prediction performance and over 50\% on the convergence speed compared to the standard dropout.

IROS Conference 2016 Conference Paper

Magnetically-guided in-situ microrobot fabrication

  • Zhe Li
  • Omid Youssefi
  • Eric D. Diller

Mobile microrobots are typically fabricated in a multi-step microfabrication process and then transported into an enclosed workspace for operation. This paper presents a new, 3D printing inspired method for in-situ fabrication of mobile magnetic microrobots with complex topology from a polymer filament on demand directly inside an enclosed operational environment. Through the use of a tip magnet on the filament, the target shape is formed by magnetic guidance from external electromagnetic coils which wirelessly project fields into the workspace as the filament is fed through a hot needle which is inserted into the workspace. A bending model and a shape planner are developed for predicting and controlling the fabrication process. Magnetically-active millimeter-scale robotic devices of different shapes and sizes are fabricated using polylactic acid (PLA) filament with diameter as small as 50 µm. As a demonstration of the in-situ formation of a functional microrobotic device, a force-sensing microrobot with integrated sensing spring is fabricated inside an enclosed space, and then is used to measure the manipulation force during a pushing experiment by optical deformation measurement. We thus show the utility of the fabrication method for creating complex microrobot shapes remotely in enclosed environments for advanced microrobotic applications, with the potential for scaled down applications in healthcare and microfluidics.