Author name cluster

Ya Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

AAAI Conference 2026 Conference Paper

HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Bingsong Bai
Yizhong Geng
Fengping Wang
Cong Wang
Puyuan Guo
Yingming Gao
Ya Li

Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model speaker timbre and vocal content separately, losing essential acoustic information that degrades output quality while requiring significant computational resources. To overcome these limitations, we propose HQ-SVC, an efficient framework for high-quality zero-shot SVC. HQ-SVC first extracts jointly content and speaker features using a decoupled codec. It then enhances fidelity through pitch and volume modeling, preserving critical acoustic information typically lost in separate modeling approaches, and progressively refines outputs via differentiable signal processing and diffusion techniques. Evaluations confirm HQ-SVC significantly outperforms state-of-the-art zero-shot SVC methods in conversion quality and efficiency. Beyond voice conversion, HQ-SVC achieves superior voice naturalness compared to specialized audio super-resolution methods while natively supporting voice super-resolution tasks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net

Puyuan Guo
Tuo Hao
Wenxin Fu
Yingming Gao
Ya Li

Recently, dance generation has attracted increasing interest. In particular, the success of diffusion models in image generation has led to the emergence of dance generation systems based on the diffusion framework. However, these systems lack controllability, which limits their practical applications. In this paper, we propose a controllable dance generation method based on the diffusion model, which can generate 3D dance motions controlled by 2D keypoint sequences. Specifically, we design a transformer-based U-Net model to predict actual motions. Then, we fix the parameters of the U-Net model and train an additional control network, enabling the generated motions to be controlled by 2D keypoints. We conduct extensive experiments and compared our method with existing works on the widely used AIST++ dataset, demonstrating that our approach has certain advantages and controllability. Moreover, we also test our model on in-the-wild videos and find that it is capable of generating dance movements similar to the motions in the videos as well.

PDF Details DOI

JBHI Journal 2025 Journal Article

Interpretable Staging Prediction of Liver Cancer Based on Joint-Knowledge Network

Xuecong Zheng
Ya Li
Zhiqi Wu
Yiyang Tang
Pei-Yuan Lai
Man-Sheng Chen
Hong-Yi Chen
Chang-Dong Wang

Clinical staging is crucial for treatment strategies and improving 5-year survival rates in hepatocellular carcinoma (HCC) patients. However, existing methods struggle to distinguish stages with highly similar textual features. Additionally, their lack of interpretability hampers their practical application in medical scenarios. Here, we introduce KnowST, a joint-knowledge network designed to leverage task relevance to explore implicit knowledge for interpretable staging prediction of liver cancer. First, the relevance of auxiliary tasks and the main task is established from two perspectives to guide the model's focus on staging-related implicit knowledge in radiology reports. Stages-to-stages: KnowST learns the inter-stage distinctions between different stages and the similarities within the same stages, using these as important references for staging differentiation. Factors-to-stages: Clinically, staging is determined by multiple tumor factors. These factors can serve as effective clues to assist KnowST in predicting the correct stage, especially in the case of confusing stages. Second, domain-specific word embeddings are introduced to bridge the gap between pre-trained language models and Chinese radiology reports. Lastly, tumor factor prediction enhances the credibility of the deep model in staging prediction, and its visualized results effectively demonstrate the model's interpretability. Overall, KnowST leverages the joint-knowledge from these two perspectives, effectively utilizing implicit information in radiology reports to achieve interpretable clinical staging. Compared to the optimal baselines, KnowST improves AUC by 7. 69% and achieves 90. 52% accuracy on 573 real-world radiology reports, while also demonstrating superior stage identification and stable performance across various metrics.

Details DOI

JBHI Journal 2025 Journal Article

LKAN: LLM-Based Knowledge-Aware Attention Network for Clinical Staging of Liver Cancer

Ya Li
Xuecong Zheng
Jiaping Li
Qingyun Dai
Chang-Dong Wang
Min Chen

Clinical staging of liver cancer (CSoLC), an important indicator for evaluating primary liver cancer (PLC), is key in the diagnosis, treatment, and rehabilitation of liver cancer. In China, the current CSoLC adopts the China liver cancer (CNLC) staging, which is usually evaluated by clinicians based on radiology reports. Therefore, inferring clinical information from unstructured radiology reports can provide auxiliary decision support for clinicians. The key to solving the challenging task is to guide the model to pay attention to the staging-related words or sentences, and the following issues may occur: 1) Imbalanced categories: Early- and mid-stage liver cancer symptoms are subtle, resulting in more data in the end-stage. 2) Domain sensitivity of liver cancer data: The liver cancer dataset contains substantial domain knowledge, leading to out-of-vocabulary issues and reduced classification accuracy. 3) Free-text and lengthy report: Radiology reports sparsely describe various lesions using domain-specific terms, making it hard to mine staging-related information. To address these, this article proposes a large language model (LLM)-based Knowledge-aware Attention Network (LKAN) for CSoLC. First, for maintaining semantic consistency, LLM and a rule-based algorithm are integrated to generate more diverse and reasonable data. Second, an unlabeled radiology corpus is pre-trained to introduce domain knowledge for subsequent representation learning. Third, attention is improved by incorporating both global and local features to guide the model's focus on staging-relevant information. Compared with the baseline models, LKAN has achieved the best results with 90. 3% Accuracy, 90. 0% Macro_F1 score, and 90. 0% Macro_Recall.

Details DOI

JBHI Journal 2024 Journal Article

ER-GET: Emotion Recognition Based on Global ECG Trajectory

Ya Li
Runxi Tan
Tianxin Lin
Qing Liu
Chang-Dong Wang
Min Chen

In recent years, the recognition of human emotions based on electrocardiogram (ECG) signals has been considered a novel area of study among researchers. Despite the challenge of extracting latent emotion information from ECG signals, existing methods are able to recognize emotions by calculating the heart rate variability (HRV) features. However, such local features have drawbacks, as they do not provide a comprehensive description of ECG signals, leading to suboptimal recognition performance. For the first time, we propose a new strategy to extract hidden emotional information from the global ECG trajectory for emotion recognition. Specifically, a period of ECG signals is decomposed into sub-signals of different frequency bands through ensemble empirical mode decomposition (EEMD), and a series of multi-sequence trajectory graphs is constructed by orthogonally combining these sub-signals to extract latent emotional information. Additionally, to better utilize these graph features, a network has been designed that includes self-supervised graph representation learning and ensemble learning for classification. This approach surpasses recent notable works, achieving outstanding results, with an accuracy of 95. 08% in arousal and 95. 90% in valence detection. Additionally, this global feature is compared and discussed in relation to HRV features, with the intention of providing inspiration for subsequent research.

Details DOI

YNIMG Journal 2024 Journal Article

Two fundamentally different mechanisms by which unconscious information impairs behavioral performance: Evidence from fMRI and computational modeling

Yongchun Wang
Meilin Di
Ya Li
Peng Liu
Jingjing Zhao
Yonghui Wang

It is increasingly clear that unconscious information impairs the performance of the corresponding action when the instruction to act is delayed. However, whether this impairment occurs at the response level or at the perceptual level remains controversial. This study used fMRI and a computational model with a pre-post design to address this elusive issue. The fMRI results showed that when the unconscious information containing strong stimulus-response associations was irrelevant to subsequent stimuli, the precuneus in the parietal lobe, which is thought to be involved in sensorimotor processing, was activated. In contrast, when the unconscious information was relevant to subsequent stimuli, regardless of the strength of the stimulus-response associations, some regions in the occipital and temporal cortices, which are thought to be involved in visual perceptual processing, were activated. In addition, the percent signal change in the regions of interest associated with motor inhibition was modulated by compatibility in the irrelevant but not in the relevant stimuli conditions. Modeling of behavioral data further supported that the irrelevant and relevant stimuli conditions involved fundamentally different mechanisms. Our finding reconciles the debate about the mechanism by which unconscious information impairs action performance and has important implications for understanding of unconscious cognition.

Details DOI

NeurIPS Conference 2022 Conference Paper

Towards Lightweight Black-Box Attack Against Deep Neural Networks

Chenghao Sun
Yonggang Zhang
Wan Chaoqun
Qizhou Wang
Ya Li
Tongliang Liu
Bo Han
Xinmei Tian

Black-box attacks can generate adversarial examples without accessing the parameters of target model, largely exacerbating the threats of deployed deep neural networks (DNNs). However, previous works state that black-box attacks fail to mislead target models when their training data and outputs are inaccessible. In this work, we argue that black-box attacks can pose practical attacks in this extremely restrictive scenario where only several test samples are available. Specifically, we find that attacking the shallow layers of DNNs trained on a few test samples can generate powerful adversarial examples. As only a few samples are required, we refer to these attacks as lightweight black-box attacks. The main challenge to promoting lightweight attacks is to mitigate the adverse impact caused by the approximation error of shallow layers. As it is hard to mitigate the approximation error with few available samples, we propose Error TransFormer (ETF) for lightweight attacks. Namely, ETF transforms the approximation error in the parameter space into a perturbation in the feature space and alleviates the error by disturbing features. In experiments, lightweight black-box attacks with the proposed ETF achieve surprising results. For example, even if only 1 sample per category available, the attack success rate in lightweight black-box attacks is only about 3% lower than that of the black-box attacks with complete training data.

PDF Details

ICML Conference 2020 Conference Paper

Dual-Path Distillation: A Unified Framework to Improve Black-Box Attacks

Yonggang Zhang 0003
Ya Li
Tongliang Liu
Xinmei Tian 0001

We study the problem of constructing black-box adversarial attacks, where no model information is revealed except for the feedback knowledge of the given inputs. To obtain sufficient knowledge for crafting adversarial examples, previous methods query the target model with inputs that are perturbed with different searching directions. However, these methods suffer from poor query efficiency since the employed searching directions are sampled randomly. To mitigate this issue, we formulate the goal of mounting efficient attacks as an optimization problem in which the adversary tries to fool the target model with a limited number of queries. Under such settings, the adversary has to select appropriate searching directions to reduce the number of model queries. By solving the efficient-attack problem, we find that we need to distill the knowledge in both the path of the adversarial examples and the path of the searching directions. Therefore, we propose a novel framework, dual-path distillation, that utilizes the feedback knowledge not only to craft adversarial examples but also to alter the searching directions to achieve efficient attacks. Experimental results suggest that our framework can significantly increase the query efficiency.

Details

ICRA Conference 2019 Conference Paper

Lightweight Contrast Modeling for Attention-Aware Visual Localization

Lili Huang 0004
Guanbin Li
Ya Li
Liang Lin

Salient object detection, which aims at localizing the attention-aware visual objects, is the indispensable technology for intelligent robots to understand and interact with the complicated environments. Existing salient object detection approaches mainly focus on the optimization of detection performance, while ignoring the considerations for computational resource consumption and algorithm efficiency. Contrarily, we build a superior lightweight network architecture to simultaneously improve performance on both accuracy and efficiency for salient object detection. Specifically, our proposed approach adopts the lightweight bottleneck as its primary building block to significantly reduce the number of parameters and to speed up the process of training and inference. In practice, the visual contrast is insufficiently discovered with the limitation of the small empirical receptive field of CNN. To alleviate this issue, we design a multi-scale convolution module to rapidly discover high-level visual contrast. Moreover, a lightweight refinement module is utilized to restore object saliency details with negligible extra cost. Extensive experiments on efficiency and accuracy trade-offs show that our model is more competitive than the state-of-the-art works on salient object detection task and has prominent potentials for robots applications in real time.

Details

AAAI Conference 2018 Conference Paper

Domain Generalization via Conditional Invariant Representations

Ya Li
Mingming Gong
Xinmei Tian
Tongliang Liu
Dacheng Tao

Domain generalization aims to apply knowledge gained from multiple labeled source domains to unseen target domains. The main difﬁculty comes from the dataset bias: training data and test data have different distributions, and the training set contains heterogeneous samples from different distributions. Let X denote the features, and Y be the class labels. Existing domain generalization methods address the dataset bias problem by learning a domain-invariant representation h(X) that has the same marginal distribution P(h(X)) across multiple source domains. The functional relationship encoded in P(Y |X) is usually assumed to be stable across domains such that P(Y |h(X)) is also invariant. However, it is unclear whether this assumption holds in practical problems. In this paper, we consider the general situation where both P(X) and P(Y |X) can change across all domains. We propose to learn a feature representation which has domain-invariant class conditional distributions P(h(X)|Y ). With the conditional invariant representation, the invariance of the joint distribution P(h(X), Y ) can be guaranteed if the class prior P(Y ) does not change across training and test domains. Extensive experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.

PDF Details

IJCAI Conference 2017 Conference Paper

Classification and Representation Joint Learning via Deep Networks

Ya Li
Xinmei Tian
Xu Shen
Dacheng Tao

Deep learning has been proven to be effective for classification problems. However, the majority of previous works trained classifiers by considering only class label information and ignoring the local information from the spatial distribution of training samples. In this paper, we propose a deep learning framework that considers both class label information and local spatial distribution information between training samples. A two-channel network with shared weights is used to measure the local distribution. The classification performance can be improved with more detailed information provided by the local distribution, particularly when the training samples are insufficient. Additionally, the class label information can help to learn better feature representations compared with other feature learning methods that use only local distribution information between samples. The local distribution constraint between sample pairs can also be viewed as a regularization of the network, which can efficiently prevent the overfitting problem. Extensive experiments are conducted on several benchmark image classification datasets, and the results demonstrate the effectiveness of our proposed method.

PDF Details

AAAI Conference 2016 Conference Paper

DARI: Distance Metric and Representation Integration for Person Veriﬁcation

Guangrun Wang
Liang Lin
Shengyong Ding
Ya Li
Qing Wang

The past decade has witnessed the rapid development of feature representation learning and distance metric learning, whereas the two steps are often discussed separately. To explore their interaction, this work proposes an end-to-end learning framework called DARI, i. e. Distance metric And Representation Integration, and validates the effectiveness of DARI in the challenging task of person veriﬁcation. Given the training images annotated with the labels, we ﬁrst produce a large number of triplet units, and each one contains three images, i. e. one person and the matched/mismatch references. For each triplet unit, the distance disparity between the matched pair and the mismatched pair tends to be maximized. We solve this objective by building a deep architecture of convolutional neural networks. In particular, the Mahalanobis distance matrix is naturally factorized as one top fully-connected layer that is seamlessly integrated with other bottom layers representing the image feature. The image feature and the distance metric can be thus simultaneously optimized via the one-shot backward propagation. On several public datasets, DARI shows very promising performance on re-identifying individuals cross cameras against various challenges, and outperforms other state-of-the-art approaches.

PDF Details

IJCAI Conference 2015 Conference Paper

Multi-Task Model and Feature Joint Learning

Ya Li
Xinmei Tian
Tongliang Liu
Dacheng Tao

Given several tasks, multi-task learning (MTL) learns multiple tasks jointly by exploring the interdependence between them. The basic assumption in MTL is that those tasks are indeed related. Existing MTL methods model the task relatedness/interdependence in two different ways, either common parameter-sharing or common featuresharing across tasks. In this paper, we propose a novel multi-task learning method to jointly learn shared parameters and shared feature representation. Our objective is to learn a set of common features with which the tasks are related as closely as possible, therefore common parameters shared across tasks can be optimally learned. We present a detailed deviation of our multi-task learning method and propose an alternating algorithm to solve the non-convex optimization problem. We further present a theoretical bound which directly demonstrates that the proposed multi-task learning method can successfully model the relatedness via joint common parameter- and common featurelearning. Extensive experiments are conducted on several real world multi-task learning datasets. All results demonstrate the effectiveness of our multitask model and feature joint learning method.

PDF Details

EAAI Journal 2014 Journal Article

Protein secondary structure optimization using an improved artificial bee colony algorithm based on AB off-lattice model

Bai Li
Ya Li
Ligang Gong

Predicting the secondary structure of protein has been the focus of scientific research for decades, but it remains to be a challenge in bioinformatics due to the increasing computation complexity. In this paper, AB off-lattice model is introduced to transforms the prediction task into a numerical optimization problem. Artificial Bee Colony algorithm (ABC) is an effective swarm intelligence algorithm, which works well in exploration but poor at exploitation. To improve the convergence performance of ABC, a novel internal feedback strategy based ABC (IF-ABC) is proposed. In this strategy, internal states are fully used in each of the iterations to guide subsequent searching process, and to balance local exploration with global exploitation. We provide the mechanism together with the convergence proof of the modified algorithm. Simulations are conducted on artificial Fibonacci sequences and real sequences in the database of Protein Data Bank (PDB). The analysis implies that IF-ABC is more effective to improve convergence rate than ABC, and can be employed for this specific protein structure prediction issues.

Details DOI