Author name cluster

Zhuo Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

2 author rows

AAAI Conference 2026 Conference Paper

AgentMental: An Interactive Multi-Agent Framework for Explainable and Adaptive Mental Health Assessment

Jinpeng Hu
Ao Wang
Qianqian Xie
Zhuo Li
Hui Ma
Dan Guo

Mental health assessment is crucial for early intervention and effective treatment, yet traditional clinician-based approaches are limited by the shortage of qualified professionals. Recent advances in artificial intelligence have sparked growing interest in automated psychological assessment, yet most existing approaches are constrained by their reliance on static text analysis, limiting their ability to capture deeper and more informative insights that emerge through dynamic interaction and iterative questioning. Therefore, in this paper, we propose a multi-agent framework for mental health evaluation that simulates clinical doctor-patient dialogues, with specialized agents assigned to questioning, adequacy evaluation, scoring, and updating. In detail, we introduce an adaptive questioning mechanism in which an evaluation agent assesses the adequacy of user responses to determine the necessity of generating targeted follow-up queries to address ambiguity and missing information. Additionally, we employ a tree-structured memory in which the root node encodes the user's basic information, while child nodes (e.g., topic and statement) organize key information according to distinct symptom categories and interaction turns. This memory is dynamically updated throughout the interaction to reduce redundant questioning and enhance the information extraction and contextual tracking capabilities. Experimental results on the DAIC-WOZ dataset illustrate the effectiveness of our proposed method, which achieves better performance than existing approaches. Our code is released at \url{https://github.com/MindIntLab-HFUT/AgentMental}.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CCAHCL: Multi-Level Hypergraph Contrastive Learning for Connected Component Awareness

Zhuo Li
Gengyu Lyu
Yuena Lin
Ziang Chen
Zhiyuan Ma
Zhen Yang
Zun Li

Hypergraph contrastive learning has emerged as a powerful unsupervised paradigm for hypergraph representation learning. Traditional hypergraph contrastive learning methods typically leverage neighbor aggregation strategy to obtain entity (node and hyperedge) representations within each connected component, and then utilize contrastive losses (e.g., node- or hyperedge-level) to update the encoders. However, since entities are usually focused equally on their respective losses, large connected components with numerous entities tend to provide a dominant contribution to the whole learning process, which inevitably hinders the effective learning of entity representations within small connected components. To address this issue, we propose a novel Connected-Component-Aware Hypergraph Contrastive Learning method (CCAHCL). Different from previous methods that only construct node or hyperedge representations, our method additionally constructs the connected component representations, and accordingly designs a hierarchical contrastive loss to balance the model's focus on different scales of connected components. Specifically, we first use the traditional neighbor aggregation strategy to aggregate and update entity (node and hyperedge) representations. Then, these entity representations are further aggregated to generate the connected component representations, where entity features are incorporated into connected components and their structural information is propagated back to enrich their corresponding entities. Afterwards, we employ node-level and hyperedge-level losses to learn the enriched entity representations, and further propose a novel connected-component-level contrastive loss to balance the model's focus on all different connected components, naturally avoiding the learning bias on large connected components. Extensive experiments on various datasets demonstrate that our proposed model achieves superior performance against other state-of-the-art methods.

PDF Details DOI

TMLR Journal 2026 Journal Article

RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

Yuhao Du
Zhuo Li
Pengyu Cheng
Zhihong Chen
Yuejiao XIE
Xiang Wan
Anningzhe Gao

Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning Large Language Models (LLMs) with human values. However, RLHF has been continuously challenged by its high complexity in implementation and computation consumption, specifically for online sampling-based methods like Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO). Even with recent simplifications, such as Direct Preference Optimization (DPO) that designs an offline implicit reward learning objective relying on pre-collected preference datasets, the problems of over-fitting and training instability remain hindering the alignment process from the expected optimal performance. To address the existing challenges, we propose a novel simplification of RLHF from the perspective of variational inference, called **V**ariational **A**lignment with **R**e-weighting (**VAR**). Specifically, by directly minimizing the distribution gap between the learning LLM policy and the optimal solution of RLHF, we transform the alignment objective into an offline reward-driven re-weighted supervised fine-tuning (SFT) form, which only requires minor adjustment on the SFT loss to obtain noticeable improvement on training stability and effectiveness. In comprehensive evaluation benchmarks, our objective empowers LLMs to outperform offline alignments, demonstrating superior performance in both helpfulness and harmlessness metrics (avg. $\uparrow7.16\%$ than DPO). Meanwhile, when compared to online sampling methods, our method is also comparable even better while significantly reducing computational overhead and accelerating convergence speed (over $5\times$ faster than GRPO), suggesting our approach as an efficient and effective solution in bridging the gap between efficiency and performance in LLM alignment.

PDF Details

EAAI Journal 2026 Journal Article

Toward large-scale lithium-ion battery energy storage systems: State of health estimation of battery clusters based on deep learning

Yihang Shen
Xin Lai
Linglong Qian
Dongxu Guo
Zhuo Li
Tonghui Li
Shuaiwei Liu
Kunyuan Sun

Details DOI

IJCAI Conference 2025 Conference Paper

Critical Node-aware Augmentation for Hypergraph Contrastive Learning

Zhuo Li
Yuena Lin
Yipeng Wang
Wenmao Liu
Mingliang Yu
Zhen Yang
Gengyu Lyu

Hypergraph contrastive learning enables effective representation learning for hypergraphs without requiring labels. However, existing methods typically rely on randomly deleting or replacing nodes during hypergraph augmentation, which may lead to the absence of critical nodes and further disrupt the higher-order structural relationships within augmented hypergraphs. To address this issue, we propose a Critical Node-aware hypergraph contrastive learning method, which is the first attempt to leverage hyperedge prediction to retain critical nodes and accordingly maintain the reliable higher-order structural relationships within augmented hypergraphs. Specifically, we first employ contrastive learning to align the augmented hypergraphs, and then generate hyperedge embeddings to characterize node representations and their structural correlations. During the hyperedge embedding encoding process, we introduce a hyperedge prediction discriminator to score these embeddings, which quantifies the nodes' contributions to identify the critical nodes and maintain the higher-order structural relationships within augmented hypergraphs. Compared with previous studies, our proposed method can effectively alleviate the erroneous deletion or replacement of critical nodes and steadily maintain the inherent structural relationships between original hypergraph and augmented hypergraphs, naturally guiding better hypergraph representations for downstream tasks. Extensive experiments on various tasks demonstrate that our method is significantly superior to state-of-the-art methods.

PDF Details DOI

TMLR Journal 2025 Journal Article

Diverse Condensed Data Generation via Class Preserving Distribution Matching

DanDan Guo
Zhuo Li
He Zhao
Mingyuan Zhou
Hongyuan Zha

Large-scale datasets for training many real-world machine learning models pose significant computational resource challenges. One approach to mitigate this is via data condensation, which aims at learning a small dataset but still sufficiently capturing the rich information in the original one. Most of existing approaches learn the condensed dataset and task-related model parameters (e.g., classifier) in a bi-level meta-learning way. The recently proposed distribution matching (DM), however, avoids the expensive bi-level optimization but ignores task-related models. This work proposes a novel class preserving DM framework consisting of two key components. The first one is responsible for capturing the original data distribution of each class based on energy distance, which can encourage the diversity in the generated synthetic data. The other is classifier-critic constraint, which forces the learned synthetic samples to fit pre-trained task-related models, such as an off-the-shelf classifier. Designing the optimization loss in this way, we can generate more diverse and class preserving distilled data without the bi-level optimization. Extensive experiments reveal that our method can produce more effective condensed data for downstream tasks with less training cost and can also be successfully applied to de-biased dataset condensation.

PDF Details

ICRA Conference 2025 Conference Paper

E2B: A Single Modality Point-Based Tracker with Event Cameras

Hongwei Ren
Zhuo Li
Aiersi Tuerhong
Haobo Liu
Fei Liang
Yongxiang Feng
Wenhui Wang 0001
Yaoyuan Wang

High-speed object tracking holds significant relevance across robotic domains, such as drones and autonomous driving. Compared to conventional cameras, event cameras are equipped with the ability to capture object motion information at exceptionally high temporal resolution with relatively low power consumption and remain immune from motion-blurring effects. Regrettably, many existing methods adopt a framebased approach by stacking events into Event Frame, which overlooks the sparsity and high temporal resolution of events. This approach is also reliant on the huge pre-training backbone and reaches a performance plateau but demands unrealistically large networks and high power consumption, rendering it impractical for real-time applications in battery-constrained robotic scenarios. In this paper, we propose an efficient and effective single-modality tracker using Point Cloud representation named E2B (Event to Box). By directly handling the raw output of event cameras without dataformat transformation, E2B leverages events' coordinate guidance to accurately map Event Cloud features to 2D bounding boxes. Moreover, E2B incorporates the pyramid structure into the multi-stage feature extraction architecture to effectively track objects across diverse scales. In the experiments, E2B performs outstandingly on two large-scale and one synthetic event-based tracking datasets, covering both indoor and outdoor environments, as well as rigid and non-rigid objects.

Details

JBHI Journal 2025 Journal Article

High-Fidelity Functional Ultrasound Reconstruction via a Visual Auto-Regressive Framework

Xuhang Chen
Zhuo Li
Yanyan Shen
Mufti Mahmud
Hieu Pham
Michael Kwok-Po Ng
Chi-Man Pun
Shuqiang Wang

Functional ultrasound (fUS) imaging provides exceptional spatiotemporal resolution for neurovascular mapping, yet its practical application is significantly hampered by critical challenges. Foremost among these is data scarcity, arising from ethical considerations and signal degradation through the cranium, which collectively limit dataset diversity and compromise the fairness of downstream machine learning models. To address these limitations, we introduce UltraVAR (Ultrasound Visual Auto-Regressive model), the first data augmentation framework designed for fUS imaging that leverages a pre-trained visual auto-regressive generative model. UltraVAR is designed not only to mitigate data scarcity but also to enhance model fairness through the reconstruction of diverse and physiologically plausible fUS samples. The generated samples preserve essential neurovascular coupling features—specifically, the dynamic interplay between neural activity and microvascular hemodynamics. This capability distinguishes UltraVAR from conventional augmentation techniques, which often disrupt these vital physiological correlations and consequently fail to improve, or even degrade, downstream task performance. The proposed UltraVAR employs a scale-by-scale reconstruction mechanism that meticulously preserves the spatial topological relationships within vascular networks. The framework's fidelity is further enhanced by two integrated modules: the Smooth Scaling Layer, which ensures the preservation of critical image information during multi-scale feature propagation, and the Perception Enhancement Module, which actively suppresses artifact generation via a dynamic residual compensation mechanism. Comprehensive experimental validation demonstrates that datasets augmented with UltraVAR yield statistically significant improvements in downstream classification accuracy. This work establishes a robust foundation for advancing ultrasound-based neuromodulation techniques and brain-computer interface technologies by enabling the reconstruction of high-fidelity, diverse fUS data

Details DOI

NeurIPS Conference 2025 Conference Paper

HyperMixup: Hypergraph-Augmented with Higher-order Information Mixup

Kaixuan Yao
Zhuo Li
Jianqing Liang
Jiye Liang
Ming Li
Feilong Cao

Hypergraphs offer a natural paradigm for modeling complex systems with multi-way interactions. Hypergraph neural networks (HGNNs) have demonstrated remarkable success in learning from such higher-order relational data. While such higher-order modeling enhances relational reasoning, the effectiveness of hypergraph learning remains bottlenecked by two persistent challenges: the scarcity of labeled data inherent to complex systems, and the vulnerability to structural noise in real-world interaction patterns. Traditional data augmentation methods, though successful in Euclidean and graph-structured domains, struggle to preserve the intricate balance between node features and hyperedge semantics, often disrupting the very group-wise interactions that define hypergraph value. To bridge this gap, we present HyperMixup, a hypergraph-aware augmentation framework that preserves higher-order interaction patterns through structure-guided feature mixing. Specifically, HyperMixup contains three critical components: 1) Structure-aware node pairing guided by joint feature-hyperedge similarity metrics, 2) Context-enhanced hierarchical mixing that preserves hyperedge semantics through dual-level feature fusion, and 3) Adaptive topology reconstruction mechanisms that maintain hypergraph consistency while enabling controlled diversity expansion. Theoretically, we establish that our method induces hypergraph-specific regularization effects through gradient alignment with hyperedge covariance structures, while providing robustness guarantees against combined node-hyperedge perturbations. Comprehensive experiments across diverse hypergraph learning tasks demonstrate consistent performance improvements over state-of-the-art baselines, with particular effectiveness in low-label regimes. The proposed framework advances hypergraph representation learning by unifying data augmentation with higher-order topological constraints, offering both practical utility and theoretical insights for relational machine learning.

PDF Details

TMLR Journal 2025 Journal Article

Improving Adversarial Training for Two-player Competitive Games via Episodic Reward Engineering

Siyuan Chen
Fuyuan Zhang
Zhuo Li
Xiongfei Wu
Jianlang Chen
Pengzhan Zhao
Lei Ma
Jianjun Zhao

In recent years, training adversarial agents has become an effective and practical approach for attacking neural network policies. However, we observe that existing methods can be further enhanced by distinguishing between states leading to win or lose and encouraging the policy training by reward engineering to prioritize winning states. In this paper, we introduce a novel adversarial training method with reward engineering for two-player competitive games. Our method extracts the historical evaluations for states from historical experiences with an episodic memory, and then incorporating these evaluations into the rewards with our proposed reward revision method to improve the adversarial policy optimization. We evaluate our approach using two-player competitive games in MuJoCo simulation environments, demonstrating that our method establishes the most promising attack performance and defense difficulty against the victims among the existing adversarial policy training techniques.

PDF Details

NeurIPS Conference 2025 Conference Paper

Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval

Haifan Gong
Xuanye Zhang
Ruifei Zhang
Yun Su
Zhuo Li
Yuhao Du
Anningzhe Gao
Xiang Wan

Recent advances in artificial intelligence have significantly impacted image retrieval tasks, yet Patent-Product Image Retrieval (PPIR) has received limited attention. PPIR, which retrieves patent images based on product images to identify potential infringements, presents unique challenges: (1) both product and patent images often contain numerous categories of artificial objects, but models pre-trained on standard datasets exhibit limited discriminative power to recognize some of those unseen objects; and (2) the significant domain gap between binary patent line drawings and colorful RGB product images further complicates similarity comparisons for product-patent pairs. To address these challenges, we formulate it as an open-set image retrieval task and introduce a comprehensive Patent-Product Image Retrieval Dataset (PPIRD) including a test set with 439 product-patent pairs, a retrieval pool of 727, 921 patents, and an unlabeled pre-training set of 3, 799, 695 images. We further propose a novel Intermediate Domain Alignment and Morphology Analogy (IDAMA) strategy. IDAMA maps both image types to an intermediate sketch domain using edge detection to minimize the domain discrepancy, and employs a Morphology Analogy Filter to select discriminative patent images based on visual features via analogical reasoning. Extensive experiments on PPIRD demonstrate that IDAMA significantly outperforms baseline methods (+7. 58 mAR) and offers valuable insights into domain mapping and representation learning for PPIR. (The PPIRD dataset is available at: \href{https: //loslorien. github. io/idama-project/}{https: //loslorien. github. io/idama-project/})

PDF Details

EAAI Journal 2025 Journal Article

Multi-scale target detection of metal surface defects in additive manufacturing based on reinforcement learning

Yunteng Niu
Yilin Zheng
Shujing Shi
Zhuo Li
Zhigong Song

Details DOI

IROS Conference 2025 Conference Paper

Open-World Task Planning for Humanoid Bimanual Dexterous Manipulation via Vision-Language Models

Zixin Tang
Zhihao Li
Junjia Liu
Zhuo Li
Fei Chen

Open-world task planning, characterized by handling unstructured and dynamic environments, has been increasingly explored to integrate with long-horizon robotic manipulation tasks. However, existing evaluations of the capabilities of these planners primarily focus on single-arm systems in structured scenarios with limited skill primitives, which is insufficient for numerous bimanual dexterous manipulation scenarios prevalent in the real world. To this end, we introduce OBiMan-Bench, a large-scale benchmark designed to rigorously evaluate open-world planning capabilities in bimanual dexterous manipulation, including task-scenario grounding, workspace constraint handling, and long-horizon cooperative reasoning. In addition, we propose OBiMan-Planner, a vision-language model-based zero-shot planning framework tailored for bimanual dexterous manipulation. OBiMan-Planner comprises two key components, the scenario grounding module for grounding open-world task instructions with specific scenarios and the task planning module for generating sequential stages. Extensive experiments on OBiMan-Bench demonstrate the effectiveness of our method in addressing complex bimanual dexterous manipulation tasks in open-world scenarios. The code, benchmark, and supplementary material are released at https://github.com/Zixin-Tang/OBiMan.

Details

TMLR Journal 2025 Journal Article

Synthesizing Minority Samples for Long-tailed Classification via Distribution Matching

Zhuo Li
He Zhao
Jinke Ren
Anningzhe Gao
DanDan Guo
Xiang Wan
Hongyuan Zha

In many real-world applications, deep neural networks (DNNs) often perform poorly on datasets with long-tailed distributions. To address this issue, a promising approach is to propose an optimization objective to transform real majority samples into synthetic minority samples. However, this objective is designed only from the classification perspective. To this end, we propose a novel framework that synthesizes minority samples from the majority by considering both classification and distribution matching. Specifically, our method adjusts the distribution of synthetic minority samples to closely align with that of the true minority class, while enforcing the synthetic samples to learn more generalizable and discriminative features of the minority class. Experimental results on several standard benchmark datasets demonstrate the effectiveness of our method in both long-tailed classification and synthesizing high-quality synthetic minority samples.

PDF Details

NeurIPS Conference 2024 Conference Paper

M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

Mingshuang Luo
RuiBing Hou
Zhuo Li
Hong Chang
Zimo Liu
Yaowei Wang
Shiguang Shan

This paper presents M$^3$GPT, an advanced $\textbf{M}$ultimodal, $\textbf{M}$ultitask framework for $\textbf{M}$otion comprehension and generation. M$^3$GPT operates on three fundamental principles. The first focuses on creating a unified representation space for various motion-relevant modalities. We employ discrete vector quantization for multimodal conditional signals, such as text, music and motion/dance, enabling seamless integration into a large language model (LLM) with a single vocabulary. The second involves modeling motion generation directly in the raw motion space. This strategy circumvents the information loss associated with a discrete tokenizer, resulting in more detailed and comprehensive motion generation. Third, M$^3$GPT learns to model the connections and synergies among various motion-relevant tasks. Text, the most familiar and well-understood modality for LLMs, is utilized as a bridge to establish connections between different motion tasks, facilitating mutual reinforcement. To our knowledge, M$^3$GPT is the first model capable of comprehending and generating motions based on multiple signals. Extensive experiments highlight M$^3$GPT's superior performance across various motion-relevant tasks and its powerful zero-shot generalization capabilities for extremely challenging tasks. Project page: \url{https: //github. com/luomingshuang/M3GPT}.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

PrivAuditor: Benchmarking Data Protection Vulnerabilities in LLM Adaptation Techniques

Derui Zhu
Dingfan Chen
Xiongfei Wu
Jiahui Geng
Zhuo Li
Jens Grossklags
Lei Ma

Large Language Models (LLMs) are recognized for their potential to be an important building block toward achieving artificial general intelligence due to their unprecedented capability for solving diverse tasks. Despite these achievements, LLMs often underperform in domain-specific tasks without training on relevant domain data. This phenomenon, which is often attributed to distribution shifts, makes adapting pre-trained LLMs with domain-specific data crucial. However, this adaptation raises significant privacy concerns, especially when the data involved come from sensitive domains. In this work, we extensively investigate the privacy vulnerabilities of adapted (fine-tuned) LLMs and benchmark privacy leakage across a wide range of data modalities, state-of-the-art privacy attack methods, adaptation techniques, and model architectures. We systematically evaluate and pinpoint critical factors related to privacy leakage. With our organized codebase and actionable insights, we aim to provide a standardized auditing tool for practitioners seeking to deploy customized LLM applications with faithful privacy assessments.

PDF Details DOI

AAAI Conference 2023 Conference Paper

A Simple Yet Effective Subsequence-Enhanced Approach for Cross-Domain NER

Jinpeng Hu
DanDan Guo
Yang Liu
Zhuo Li
Zhihong Chen
Xiang Wan
Tsung-Hui Chang

Cross-domain named entity recognition (NER), aiming to address the limitation of labeled resources in the target domain, is a challenging yet important task. Most existing studies alleviate the data discrepancy across different domains at the coarse level via combing NER with language modelings or introducing domain-adaptive pre-training (DAPT). Notably, source and target domains tend to share more fine-grained local information within denser subsequences than global information within the whole sequence, such that subsequence features are easier to transfer, which has not been explored well. Besides, compared to token-level representation, subsequence-level information can help the model distinguish different meanings of the same word in different domains. In this paper, we propose to incorporate subsequence-level features for promoting the cross-domain NER. In detail, we first utilize a pre-trained encoder to extract the global information. Then, we re-express each sentence as a group of subsequences and propose a novel bidirectional memory recurrent unit (BMRU) to capture features from the subsequences. Finally, an adaptive coupling unit (ACU) is proposed to combine global information and subsequence features for predicting entity labels. Experimental results on several benchmark datasets illustrate the effectiveness of our model, which achieves considerable improvements.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Enhancing Minority Classes by Mixing: An Adaptative Optimal Transport Approach for Long-tailed Classification

Jintong Gao
He Zhao
Zhuo Li
DanDan Guo

Real-world data usually confronts severe class-imbalance problems, where several majority classes have a significantly larger presence in the training set than minority classes. One effective solution is using mixup-based methods to generate synthetic samples to enhance the presence of minority classes. Previous approaches mix the background images from the majority classes and foreground images from theminority classes in a random manner, which ignores the sample-level semantic similarity, possibly resulting in less reasonable or less useful images. In this work, we propose an adaptive image-mixing method based on optimal transport (OT) to incorporate both class-level and sample-level information, which is able to generate semantically reasonable and meaningful mixed images for minority classes. Due toits flexibility, our method can be combined with existing long-tailed classification methods to enhance their performance and it can also serve as a general data augmentation method for balanced datasets. Extensive experiments indicate that our method achieves effective performance for long-tailed classification tasks. The code is available at https: //github. com/JintongGao/Enhancing-Minority-Classes-by-Mixing.

PDF Details

NeurIPS Conference 2022 Conference Paper

Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification

DanDan Guo
Zhuo Li
meixi zheng
He Zhao
Mingyuan Zhou
Hongyuan Zha

Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting, where training samples are associated with different weights in the loss function. Most of existing re-weighting approaches treat the example weights as the learnable parameter and optimize the weights on the meta set, entailing expensive bilevel optimization. In this paper, we propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view. Specifically, we view the training set as an imbalanced distribution over its samples, which is transported by OT to a balanced distribution obtained from the meta set. The weights of the training samples are the probability mass of the imbalanced distribution andlearned by minimizing the OT distance between the two distributions. Compared with existing methods, our proposed one disengages the dependence of the weight learning on the concerned classifier at each iteration. Experiments on image, text and point cloud datasets demonstrate that our proposed re-weighting method has excellent performance, achieving state-of-the-art results in many cases andproviding a promising tool for addressing the imbalanced classification issue. The code has been made available athttps: //github. com/DandanGuo1993/reweight-imbalance-classification-with-OT.

PDF Details

AAAI Conference 2022 Conference Paper

Rethinking the Optimization of Average Precision: Only Penalizing Negative Instances before Positive Ones Is Enough

Zhuo Li
Weiqing Min
Jiajun Song
Yaohui Zhu
Liping Kang
Xiaoming Wei
Xiaolin Wei
Shuqiang Jiang

Optimising the approximation of Average Precision (AP) has been widely studied for image retrieval. Limited by the definition of AP, such methods consider both negative and positive instances ranking before each positive instance. However, we claim that only penalizing negative instances before positive ones is enough, because the loss only comes from these negative instances. To this end, we propose a novel loss, namely Penalizing Negative instances before Positive ones (PNP), which can directly minimize the number of negative instances before each positive one. In addition, AP-based methods adopt a fixed and sub-optimal gradient assignment strategy. Therefore, we systematically investigate different gradient assignment solutions via constructing derivative functions of the loss, resulting in PNP-I with increasing derivative functions and PNP-D with decreasing ones. PNP-I focuses more on the hard positive instances by assigning larger gradients to them and tries to make all relevant instances closer. In contrast, PNP-D pays less attention to such instances and slowly corrects them. For most realworld data, one class usually contains several local clusters. PNP-I blindly gathers these clusters while PNP-D keeps them as they were. Therefore, PNP-D is more superior. Experiments on three standard retrieval datasets show consistent results with the above analysis. Extensive evaluations demonstrate that PNP-D achieves the state-of-the-art performance. Code is available at https: //github. com/interestingzhuo/PNPloss

PDF Details

JBHI Journal 2020 Journal Article

CycleGAN With an Improved Loss Function for Cell Detection Using Partly Labeled Images

Jin He
Cong Wang
Dan Jiang
Zhuo Li
Yangyi Liu
Tao Zhang

The object detection, which has been widely applied in the biomedical field already, is of real significance but technically challenging. In practice, the object detection accuracy is vulnerable to labeling quality, which is usually not a big headache for simple algorithm or model verification since there are a bunch of ideal public available datasets whose classes and tags are all well-marked. However, in real scenarios, image data is often partially or even incorrectly labeled. Particularly, in cell detection, this becomes a thorny issue since the labelling of the dataset is incomplete and inaccurate. To address this issue, we propose a data-augmentation algorithm that can generate full labeled cell image data from incomplete labeled ones. First of all, we randomly extract the labeled objects from raw cell images, and meanwhile, keep their corresponding position information. Next, we employ the framework of cycle-consistent adversarial network, but significantly distinguished from the original one, to generate fully labeled data including both objects and backgrounds. We conduct extensive experiments on a blood cell classification dataset called BCCD to evaluate our model, and experimental results show that our proposed method can successfully address the weak annotation problem and improve the performance of object detection.

Details DOI

AAAI Conference 2020 Conference Paper

Generating Adversarial Examples for Holding Robustness of Source Code Processing Models

Huangzhao Zhang
Zhuo Li
Ge Li
Lei Ma
Yang Liu
Zhi Jin

Automated processing, analysis, and generation of source code are among the key activities in software and system lifecycle. To this end, while deep learning (DL) exhibits a certain level of capability in handling these tasks, the current stateof-the-art DL models still suffer from non-robust issues and can be easily fooled by adversarial attacks. Different from adversarial attacks for image, audio, and natural languages, the structured nature of programming languages brings new challenges. In this paper, we propose a Metropolis-Hastings sampling-based identiﬁer renaming technique, named Metropolis-Hastings Modiﬁer (MHM), which generates adversarial examples for DL models specialized for source code processing. Our in-depth evaluation on a functionality classiﬁcation benchmark demonstrates the effectiveness of MHM in generating adversarial examples of source code. The higher robustness and performance enhanced through our adversarial training with MHM further conﬁrms the usefulness of DL models-based method for future fully automated source code processing.

PDF Details

AAAI Conference 2018 Conference Paper

Deep Representation-Decoupling Neural Networks for Monaural Music Mixture Separation

Zhuo Li
Hongwei Wang
Miao Zhao
Wenjie Li
Minyi Guo

Monaural source separation (MSS) aims to extract and reconstruct different sources from a single-channel mixture, which could facilitate a variety of applications such as chord recognition, pitch estimation and automatic transcription. In this paper, we study the problem of separating vocals and instruments from monaural music mixture. Existing works for monaural source separation either utilize linear and shallow models (e. g. , non-negative matrix factorization), or do not explicitly address the coupling and tangling of multiple sources in original input signals, hence they do not perform satisfactorily in real-world scenarios. To overcome the above limitations, we propose a novel end-to-end framework for monaural music mixture separation called Deep Representation- Decoupling Neural Networks (DRDNN). DRDNN takes advantages of both traditional signal processing methods and popular deep learning models. For each input of music mixture, DRDNN converts it to a two-dimensional timefrequency spectrogram using short-time Fourier transform (STFT), followed by stacked convolutional neural networks (CNN) layers and long-short term memory (LSTM) layers to extract more condensed features. Afterwards, DRDNN utilizes a decoupling component, which consists of a group of multi-layer perceptrons (MLP), to decouple the features further into different separated sources. The design of decoupling component in DRDNN produces puriﬁed single-source signals for subsequent full-size restoration, and can signiﬁcantly improve the performance of ﬁnal separation. Through extensive experiments on real-world dataset, we prove that DRDNN outperforms state-of-the-art baselines in the task of monaural music mixture separation and reconstruction.

PDF Details