Author name cluster

Hui Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

61 papers

2 author rows

TIST Journal 2026 Journal Article

Adversarial Face Database against Deep Learning-Enabled Reconstruction Attacks

Hui Liu
Ling Ding
Jiageng Chen
Jinghua Wang
Xu Du
Jiabao Guo

Face recognition systems offer a range of applications that enhance security, efficiency, and personalization, e.g., access control, identity verification, and personalized services. Mainstream facial recognition systems employ the Edge-Cloud architecture to protect user privacy by storing facial feature data instead of original facial images. However, recently emerging reconstruction attacks based on deep learning can recover the visual information of original facial images from facial features, resulting in face privacy disclosure. Existing anti-reconstruction approaches either compromise facial recognition accuracy or fail to meet real-time requirements. In this article, we propose a practical privacy-preserving approach based on adversarial perturbations against reconstruction attacks. By incorporating subtle adversarial interference into facial features, the mapping relationship from facial features to original facial images is disrupted, and the baseline reconstruction networks cannot recover the original face image. We conducted experiments on two facial recognition models, FaceNet and ArcFace, both widely deployed in practical scenarios. The results show that the face recognition accuracy sacrifice of less than 1% can significantly reduce the quality of the reconstructed image. In terms of efficiency, the average time to generate an adversarial facial feature is less than 10 ms, meeting the real-time requirements of facial recognition.

AAAI Conference 2026 Conference Paper

Chinese Two-part Allegorical Sayings Reading Comprehension: Exploration from Reasoning to Metaphor

Dongyu Su
Yimin Xiao
Tongguan Wang
Feiyue Xue
Junkai Li
Hui Liu
Ying Sha

The Two-Part Allegorical Saying (TPAS) is a Chinese linguistic phenomenon with a riddle-explanation structure, and an important component of Chinese metaphors. Existing research has primarily used TPAS to assist other semantic tasks, but lacks in-depth exploration of its intrinsic mechanisms: semantic rhetoric, logical reasoning, and metaphorical expression. To address this gap, we construct the first Chinese TPAS Reading Comprehension dataset (CTRC), which contains 18,103 TPASs and 75,296 passages. We frame it as a cloze test where the model selects the most suitable TPAS from candidates to fill passage blanks. To tackle the challenges of this CTRC task, we propose a Multi-view TPAS Contrastive Learning Network (MTCLN). Firstly, the joint vector cross-projection module extracts the rhetorical features of TPAS, such as homophonic puns, through vector space mapping to mitigate the semantic deviations caused by rhetoric. Then, the softened contrastive learning module strengthens the modeling of TPAS logical reasoning through feature association. Finally, the multi-view feature fusion module integrates contextual semantics with diverse TPAS features to facilitate the understanding of metaphorical expressions. Experiments on the CTRC dataset demonstrate that MTCLN achieves an average accuracy of 67.47%, outperforming large language models by 25.48%.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning

Bo Han
Zhuoming Li
Xiaoyu Wang
Yaxin Hou
Hui Liu
Junhui Hou
Yuheng Jia

Semi-supervised multi-label learning (SSMLL) aims to address the challenge of limited labeled data in multi-label learning (MLL) by leveraging unlabeled data to improve the model’s performance. While pseudo-labeling has become a dominant strategy in SSMLL, most existing methods assign equal weights to all pseudo-labels regardless of their quality, which can amplify the impact of noisy or uncertain predictions and degrade the overall performance. In this paper, we theoretically verify that the optimal weight for a pseudo-label should reflect its correctness likelihood. Empirically, we observe that on the same dataset, the correctness likelihood distribution of unlabeled data remains stable, even as the number of labeled training samples varies. Building on this insight, we propose Distribution-Calibrated Pseudo-labeling (DiCaP), a correctness-aware framework that estimates posterior precision to calibrate pseudo-label weights. We further introduce a dual-thresholding mechanism to separate confident and ambiguous regions: confident samples are pseudo-labeled and weighted accordingly, while ambiguous ones are explored by unsupervised contrastive learning. Experiments conducted on multiple benchmark datasets verify that our method achieves consistent improvements, surpassing state-of-the-art methods by up to 4.27%.

PDF Details DOI

TMLR Journal 2026 Journal Article

DiffKGW: Stealthy and Robust Diffusion Model Watermarking

Tianxin Wei
Ruizhong Qiu
Yifan Chen
Yunzhe Qi
Jiacheng Lin
Wenxuan Bao
Wenju Xu
Sreyashi Nag

Diffusion models are known for their supreme capability to generate realistic images. However, ethical concerns, such as copyright protection and the generation of inappropriate content, pose significant challenges for the practical deployment of diffusion models. Recent work has proposed a flurry of watermarking techniques that inject artificial patterns into initial latent representations of diffusion models, offering a promising solution to these issues. However, enforcing a specific pattern on selected elements can disrupt the Gaussian distribution of the initial latent representation. Inspired by watermarks for large language models (LLMs), we generalize the LLM KGW watermark to image diffusion models and propose a stealthy probability adjustment approach DiffKGW that preserves the Gaussian distribution of initial latent representation. In addition, we dissect the design principles of state-of-the-art watermarking techniques and introduce a unified framework. We identify a set of dimensions that explain the manipulation enforced by watermarking methods, including the distribution of individual elements, the specification of watermark shapes within each channel, and the choice of channels for watermark embedding. Through the empirical studies on regular text-to-image applications and the first systematic attempt at watermarking image-to-image diffusion models, we thoroughly verify the effectiveness of our proposed framework through comprehensive evaluations. On all the diffusion models, including Stable Diffusion, our approach induced from the proposed framework not only preserves image quality but also outperforms existing methods in robustness against a wide range of attacks.

AAAI Conference 2026 Conference Paper

ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering

Xinyue Wang
Yuheng Jia
Hui Liu
Junhui Hou

Typical deep clustering methods, while achieving notable progress, can only provide one clustering result per dataset. This limitation arises from their assumption of a fixed underlying data distribution, which may fail to meet user needs and provide unsatisfactory clustering outcomes. Our work investigates how multi-modal large language models (MLLMs) can be leveraged to achieve user-driven clustering, emphasizing their adaptability to user-specified semantic requirements. However, directly using MLLM output for clustering has risks for producing unstructured and generic image descriptions instead of feature-specific and concrete ones. To address these issues, our method first discovers that MLLMs' hidden states of text tokens are strongly related to the corresponding features, and leverages these embeddings to perform clusterings from any user-defined criteria. We also employ a lightweight clustering head augmented with pseudo-label learning, significantly enhancing clustering accuracy. Extensive experiments demonstrate its competitive performance on diverse datasets and metrics.

PDF Details DOI

EAAI Journal 2026 Journal Article

Feature adaptive extraction and gated fusion networks for road crack segmentation

Jiaxiu Dong
Wentong Guo
Niannian Wang
Xuemei Liu
Guodong Zhao
Hui Liu
Zhen Li

AAAI Conference 2026 Conference Paper

Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models

Yu Fu
Haz Sameen Shahgir
Hui Liu
Xianfeng Tang
Qi He
Yue Dong

Recent advances in long-context language models (LCLMs), designed to handle extremely long contexts, primarily focus on utilizing external contextual information, often leaving the influence of language models' parametric knowledge underexplored. In this work, we firstly investigate how this parametric knowledge affects content generation and demonstrate that its impact becomes increasingly pronounced as context length extends. Furthermore, we show that the model’s ability to utilize parametric knowledge, which we call parametric recall ability, does not improve simultaneously with its ability to leverage contextual knowledge through extrinsic retrieval ability. Moreover, better extrinsic retrieval ability can interfere with the model’s parametric recall ability, limiting its full potential. To bridge this gap, we design a simple yet effective Hybrid Needle-in-a-Haystack test that evaluates models based on their capabilities across both abilities, rather than solely emphasizing extrinsic retrieval ability. Our experimental results reveal that Qwen-2.5 models significantly outperform Llama-3.1 models, demonstrating superior potential to combine various abilities. Moreover, even the more powerful Llama-3.1-70B-Instruct model fails to exhibit better performance, highlighting the importance of evaluating models from a dual-ability perspective.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learning the Latent Structure: A Feature-Centric Approach to Graph Data Augmentation

Yu Song
Zhigang Hua
Yan Xie
Bingheng Li
Jingzhe Liu
Bo Long
Jiliang Tang
Hui Liu

Graph-structured data plays a pivotal role in modeling complex relationships. However, real-world graphs are often incomplete due to data collection and observational constraints, severely limiting the effectiveness of modern graph learning pipelines. While existing Graph Data Augmentation (GDA) methods attempt to refine graph structures for improved downstream performance, they are typically label-dependent, computationally expensive, and inherently transductive, limiting their applicability in practical scenarios. In this work, we present a novel feature-centric graph data augmentation framework that bypasses explicit structure modeling by operating directly in the embedding space. Through a self-supervised inverse masking process, our method captures latent ties between observed and complete graphs, enabling recovery of unobserved structural signals through refined node representations. To enhance robustness under noisy and sparse supervision, we introduce a message regularizer and a bootstrap strategy for effective training and generalization. Evaluated on ten graph datasets spanning multiple domains, our approach, SelfAug, consistently outperforms state-of-the-art methods in both accuracy and efficiency across inductive and cold-start settings, highlighting its potential as a scalable and generalizable solution for real-world graph learning scenarios.

PDF Details DOI

JBHI Journal 2026 Journal Article

M$^{2}$SegMamba: Mamba-Based Incomplete Multimodal Learning for Brain Tumor Segmentation With Few Samples

Xinyue Zhang
Ali Bahri
Christian Desrosiers
Hui Liu
Fangxun Bao

The accurate segmentation of brain tumors plays an important role in clinical diagnosis and treatment. Multimodal magnetic resonance imaging (MRI) can provide rich and complementary information for accurate brain tumor segmentation. However, the common problems of incomplete modalities and small samples in clinical practice seriously affect the performance of multimodal segmentation. In this work, we design a new framework, named M $^{2}$ SegMamba, using Mamba and Masked Autoencoder networks for both supervised and self-supervised learning, aimed at handling small sample brain tumor segmentation under various incomplete multimodality settings. We construct a masking strategy suitable for multimodal brain tumors to precisely extract image features, which serves as the foundation for image segmentation. By fully leveraging the capabilities of the Mamba network, we design a multi-traversal method to facilitate the interaction between inter-modal and cross-modal image features. Meanwhile, the introduction of TSmamba in skipping connections efficiently integrates multimodal features. Auxiliary regularizers are introduced in both the encoder and decoder to further enhance the model’s robustness to incomplete modalities. We conducted experiments on the BraTS 2018 and BraTS 2020 datasets, and the results demonstrate that our method outperforms state-of-the-art brain tumor segmentation methods on most subsets of missing modalities.

EAAI Journal 2026 Journal Article

Real-time adaptive energy management strategy based on multi-agent collaborative decision-time planning for off-road hybrid electric vehicles

Xiaokang Ma
Hui Liu
Lijin Han
Ningkang Yang
Congshuai Guo

AAAI Conference 2026 Conference Paper

Towards Better IncomLDL: We Are Unaware of Hidden Labels in Advance

Jiecheng Jiang
Jiawei Tang
Jiahao Jiang
Hui Liu
Junhui Hou
Yuheng Jia

Label distribution learning (LDL) is a novel paradigm that describe the samples by label distribution of a sample. However, acquiring LDL dataset is costly and time-consuming, which leads to the birth of incomplete label distribution learning (IncomLDL). All the previous IncomLDL methods set the description degrees of "missing" labels in an instance to 0, but remains those of other labels unchanged. This setting is unrealistic because when certain labels are missing, the degrees of the remaining labels will increase accordingly. We fix this unrealistic setting in IncomLDL and raise a new problem: LDL with hidden labels (HidLDL), which aims to recover a complete label distribution from a real-world incomplete label distribution where certain labels in an instance are omitted during annotation. To solve this challenging problem, we discover the significance of proportional information of the observed labels and capture it by an innovative constraint to utilize it during the optimization process. We simultaneously use local feature similarity and the global low-rank structure to reveal the mysterious veil of hidden labels. Moreover, we **theoretically** give the recovery bound of our method, proving the feasibility of our method in learning from hidden labels. Extensive recovery and predictive experiments on various datasets prove the superiority of our method to state-of-the-art LDL and IncomLDL methods.

PDF Details DOI

EAAI Journal 2025 Journal Article

A non-negative garrote shrinkage network with adaptive Swish for rotating machinery fault diagnosis under noisy environment

Pengcheng Zhong
Zhenyu Liu
Rui Li
Hui Liu
Xiaoqi Yang
Zihan Dong
Jianrong Tan

NeurIPS Conference 2025 Conference Paper

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Fali Wang
Hui Liu
Zhenwei Dai
Jingying Zeng
Zhiwei Zhang
Zongyu Wu
Chen Luo
Zhen Li

Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world problems are multi-stage complex tasks, composed of a sequence of heterogeneous subtasks with each subtask requires LLM of specific capability. Therefore, we study a novel problem: the test-time compute-optimal scaling in multi-stage complex tasks, aiming to select suitable models and allocate budgets per subtask to maximize overall performance. TTS in multi-stage tasks introduces two fundamental challenges: (i) The combinatorial search space of model and budget allocations, combined with the high cost of inference, makes brute-force search impractical. (ii) The optimal model and budget allocations across subtasks are interdependent, increasing the complexity of the compute-optimal search. To address this gap, we conduct extensive pilot experiments on four tasks across six datasets, deriving three empirical insights characterizing the behavior of LLMs in multi-stage complex tasks. Informed by these insights, we propose AgentTTS, an LLM-agent-based framework that autonomously searches for compute-optimal allocations through iterative feedback-driven interactions with the execution environment. Experimental results demonstrate that AgentTTS significantly outperforms traditional and other LLM-based baselines in search efficiency, and shows improved robustness to varying training set sizes and enhanced interpretability.

EAAI Journal 2025 Journal Article

Detailed fault detection of industrial sensor based on semantic segmentation models

Xirui Chen
Hui Liu

EAAI Journal 2025 Journal Article

Dynamic flame feature-driven prediction model for basic oxygen furnace steelmaking endpoint carbon content based on three-dimensional multi-layer complex networks

Jianxun Liu
Hui Liu
FuGang Chen
YunKe Su
Heng Li
Xiaojun Xue

JBHI Journal 2025 Journal Article

FDDSeg: Unleashing the Power of Scribble Annotation for Cardiac MRI Images Through Feature Decomposition Distillation

Lin Zhang
Wenzong Li
Kaiyue Bi
Pei Li
Liang Zhang
Hui Liu

Cardiovascular diseases can be diagnosed with computer assistance when using the magnetic resonance imaging (MRI) image that is produced by the MRI sensor. Deep learning-based scribbling MRI image segmentation has demonstrated impressive results recently. However, the majority of current approaches possess an excessive number of model parameters and do not completely utilize scribbling annotations. To develop a feature decomposition distillation deep learning method, named FDDSeg, for scribble-supervised cardiac MRI image segmentation. Public ACDC and MSCMR cardiac MRI datasets were used to evaluate the segmentation performance of FDDSeg. FDDSeg adopts a scribble annotation reuse policy to help provide accurate boundaries, and the intermediate features are split class region and class-free region by using the pseudo labels to further improve feature learning. Effective distillation knowledge is then captured by feature decomposition. FDDSeg was compared with 7 state-of-the-art methods, MAAG, ShapePU, CycleMix, Dual-Branch, ZscribbleSeg, Perturbation Dual-Branch as well as ScribbleVC on both ACDC and MSCMR datasets. FDDSeg is shown to perform the best in DSC(89. 05% and 88. 75%), JC(80. 30% and 79. 78%) as well as HD95(5. 76% and 4. 44%) metrics with only 2. 01 M of parameters. FDDSeg methods can segment cardiac MRI images more precise with only scribble annotations at lower computation cost, which may help increase the efficiency of quantitative analysis of cardiac.

TIST Journal 2025 Journal Article

Graph Machine Learning in the Era of Large Language Models (LLMs)

Shijie Wang
Jiani Huang
Zhikai Chen
Yu Song
Wenzhuo Tang
Haitao Mao
Wenqi Fan
Hui Liu

Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graphs. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications, such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML’s generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations, such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph Heterophily and Out-of-Distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.

NeurIPS Conference 2025 Conference Paper

Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning

Yaxin Hou
Bo Han
Yuheng Jia
Hui Liu
Junhui Hou

Current long-tailed semi-supervised learning methods assume that labeled data exhibit a long-tailed distribution, and unlabeled data adhere to a typical predefined distribution (i. e. , long-tailed, uniform, or inverse long-tailed). However, the distribution of the unlabeled data is generally unknown and may follow an arbitrary distribution. To tackle this challenge, we propose a Controllable Pseudo-label Generation (CPG) framework, expanding the labeled dataset with the progressively identified reliable pseudo-labels from the unlabeled dataset and training the model on the updated labeled dataset with a known distribution, making it unaffected by the unlabeled data distribution. Specifically, CPG operates through a controllable self-reinforcing optimization cycle: (i) at each training step, our dynamic controllable filtering mechanism selectively incorporates reliable pseudo-labels from the unlabeled dataset into the labeled dataset, ensuring that the updated labeled dataset follows a known distribution; (ii) we then construct a Bayes-optimal classifier using logit adjustment based on the updated labeled data distribution; (iii) this improved classifier subsequently helps identify more reliable pseudo-labels in the next training step. We further theoretically prove that this optimization cycle can significantly reduce the generalization error under some conditions. Additionally, we propose a class-aware adaptive augmentation module to further improve the representation of minority classes, and an auxiliary branch to maximize data utilization by leveraging all labeled and unlabeled samples. Comprehensive evaluations on various commonly used benchmark datasets show that CPG achieves consistent improvements, surpassing state-of-the-art methods by up to 15. 97\% in accuracy. The code is available at https: //github. com/yaxinhou/CPG.

NeurIPS Conference 2025 Conference Paper

Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy

Jie Ren
Zhenwei Dai
Xianfeng Tang
Yue Xing
Shenglai Zeng
Jingying Zeng
Qiankun Peng
Samarth Varshney

Although Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks, growing concerns have emerged over the misuse of sensitive, copyrighted, or harmful data during training. To address these concerns, unlearning techniques have been developed to remove the influence of specific data without retraining from scratch. However, this paper reveals a critical vulnerability in fine-tuning-based unlearning: a malicious user can craft a manipulated forgetting request that stealthily degrades the model’s utility for benign users. We demonstrate this risk through a red-teaming Stealthy Attack (SA), which is inspired by two key limitations of existing unlearning—the inability to constrain the scope of unlearning effect and the failure to distinguish benign tokens from unlearning signals. Prior work has shown that unlearned models tend to memorize forgetting data as unlearning signals, and respond with hallucinations or feigned ignorance when unlearning signals appear in the input. By subtly increasing the presence of common benign tokens in the forgetting data, SA enhances the connection between benign tokens and unlearning signals. As a result, when normal users include such tokens in their prompts, the model exhibits unlearning behaviors, leading to unintended utility degradation. To address this vulnerability, we propose Scope-aware Unlearning (SU), a lightweight enhancement that introduces a scope term into the unlearning objective, encouraging the model to localize the forgetting effect. Our method requires no additional data processing, integrates seamlessly with existing fine-tuning frameworks, and significantly improves robustness against SA. Extensive experiments validate the effectiveness of both SA and SU.

NeurIPS Conference 2025 Conference Paper

Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Kecheng Chen
Pingping Zhang
Hui Liu
Jie Liu
Yibing Liu
Jiaxin Huang
Shiqi Wang
Hong Yan

We have recently witnessed that ''Intelligence" and `''Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute is particularly appealing to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P -LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e. g. ,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P-LLM can beat SOTA classical and learned codecs.

AAAI Conference 2025 Conference Paper

Learning Cross-Domain Representations for Transferable Drug Perturbations on Single-Cell Transcriptional Responses

Hui Liu
Shikai Jin

Phenotypic drug discovery has attracted widespread attention because of its potential to identify bioactive molecules. Transcriptomic profiling provides a comprehensive reflection of phenotypic changes in cellular responses to external perturbations. In this paper, we propose XTransferCDR, a novel generative framework designed for feature decoupling and transferable representation learning across domains. Given a pair of perturbed expression profiles, our approach decouples the perturbation representations from basal states through domain separation encoders and then cross-transfers them in the latent space. The transferred representations are then used to reconstruct the corresponding perturbed expression profiles via a shared decoder. This cross-transfer constraint effectively promotes the learning of transferable drug perturbation representations. We conducted extensive evaluations of our model on multiple datasets, including single-cell transcriptional responses to drugs and single- and combinatorial genetic perturbations. The experimental results show that XTransferCDR achieved better performance than current state-of-the-art methods, showcasing its potential to advance phenotypic drug discovery.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Memory Injection Attacks on LLM Agents via Query-Only Interaction

Shen Dong
Shaochen Xu
Pengfei He
Yige Li
Jiliang Tang
Tianming Liu
Hui Liu
Zhen Xiang

Agents powered by large language models (LLMs) have demonstrated strong capabilities in a wide range of complex, real-world applications. However, LLM agents with a compromised memory bank may easily produce harmful outputs when the past records retrieved for demonstration are malicious. In this paper, we propose a novel Memory INJection Attack, MINJA, without assuming that the attacker can directly modify the memory bank of the agent. The attacker injects malicious records into the memory bank by only interacting with the agent via queries and output observations. These malicious records are designed to elicit a sequence of malicious reasoning steps corresponding to a different target query during the agent's execution of the victim user's query. Specifically, we introduce a sequence of bridging steps to link victim queries to the malicious reasoning steps. During the memory injection, we propose an indication prompt that guides the agent to autonomously generate similar bridging steps, with a progressive shortening strategy that gradually removes the indication prompt, such that the malicious record will be easily retrieved when processing later victim queries. Our extensive experiments across diverse agents demonstrate the effectiveness of MINJA in compromising agent memory. With minimal requirements for execution, MINJA enables any user to influence agent memory, highlighting the risk.

EAAI Journal 2025 Journal Article

Multimodal data imputation and fusion for trustworthy fault diagnosis of mechanical systems

Jie Zhang
Yun Kong
Qinkai Han
Tianyang Wang
Mingming Dong
Hui Liu
Fulei Chu

IJCAI Conference 2025 Conference Paper

Phenotypic Profile-Informed Generation of Drug-Like Molecules via Dual-Channel Variational Autoencoders

Hui Liu
Shiye Tian
Xuejun Liu

The de novo generation of drug-like molecules capable of inducing desirable phenotypic changes is receiving increasing attention. However, previous methods predominantly rely on expression profiles to guide molecule generation, but overlook the perturbative effect of the molecules on cellular contexts. To overcome this limitation, we propose SmilesGEN, a novel generative model based on variational autoencoder (VAE) architecture to generate molecules with potential therapeutic effects. SmilesGEN integrates a pre-trained drug VAE (SmilesNet) with an expression profile VAE (ProfileNet), jointly modeling the interplay between drug perturbations and transcriptional responses in a common latent space. Specifically, ProfileNet is imposed to reconstruct pre-treatment expression profiles when eliminating drug-induced perturbations in the latent space, while SmilesNet is informed by desired expression profiles to generate drug-like molecules. Our empirical experiments demonstrate that SmilesGEN outperforms current state-of-the-art models in generating molecules with higher degree of validity, uniqueness, novelty, as well as higher Tanimoto similarity to known ligands targeting the relevant proteins. Moreover, we evaluate SmilesGEN for scaffold-based molecule optimization and generation of therapeutic agents, and confirmed its superior performance in generating molecules with higher similarity to approved drugs. SmilesGEN establishes a robust framework that leverages gene signatures to generate drug-like molecules that hold promising potential to induce desirable cellular phenotypic changes. The source code and datasets are available at: https: //github. com/hliulab/SmilesGEN.

PDF Details DOI

JBHI Journal 2025 Journal Article

Predicting Single-Cell Drug Sensitivity Utilizing Adaptive Weighted Features for Multi-Source Domain Adaptation

Hui Liu
Wei Duan
Judong Luo

The advancement of single-cell sequencing technology has promoted the generation of a large amount of single-cell transcriptional profiles, providing unprecedented opportunities to identify drug-resistant cell subpopulations within a tumor. However, few studies have focused on drug response prediction at single-cell level, and their performance remains suboptimal. This paper proposed scAdaDrug, a novel multi-source domain adaptation model powered by adaptive importance-aware representation learning to predict drug response of individual cells. We used a shared encoder to extract domain-invariant features related to drug response from multiple source domains by utilizing adversarial domain adaptation. Particularly, we introduced a plug-and-play module to generate importance-aware and mutually independent weights, which could adaptively modulate the latent representation of each sample in element-wise manner between source and target domains. Extensive experimental results showed that our model achieved state-of-the-art performance in predicting drug response on multiple independent datasets, including single-cell datasets derived from both cell lines and patient-derived xenografts (PDX) models, as well as clinical tumor patient cohorts. Moreover, the ablation experiments demonstrated our model effectively captured the underlying patterns determining drug response from multiple source domains.

ECAI Conference 2025 Conference Paper

Refining Dataset Distillation via Critical Region Selection and Multiview Teacher Guidance

Wenqing Ye
Xinyu Li
Hui Liu
Di Wu
Xiaoyan Sun 0001
Ronald X. Xu
Mingzhai Sun

Dataset distillation (DD) aims to improve training efficiency by condensing large datasets into compact yet informative subsets. Existing DD methods primarily use optimization-based approaches for image synthesis, which are computationally intensive. While some recent studies have explored optimization-free alternatives, their simplistic region selection strategies result in poor representations of the original dataset and inefficient utilization of synthetic data during downstream training. To address these limitations, we propose a method called Selection-and-Guidance for Dataset Distillation (SGDD). This approach refines the distillation process through two key stages: region selection and guidance enhancement. Specifically, we first obtain a candidate set of regions from various locations within the images. Then, we utilize the Representative Region Selector and Diverse Region Selector to identify the critical regions for image classification. Furthermore, we generate multiview guidance information for the synthetic data to enhance the distillation process further. By selecting representative and diverse regions while incorporating multiview guidance, our method unleashes the potential of optimization-free DD. Experimental results substantiate the superiority of our approach across various datasets and network architectures.

IS Journal 2025 Journal Article

Semantic Map Construction Under Complex Weather Scenarios

Hui Liu
Minhang Liu
Faliang Chang
Chunsheng Liu
Yansha Lu

Semantic maps provide information on road elements, which is crucial for ensuring driving safety. However, previous methods mostly focus on normal weather conditions, neglecting the challenges of scene feature extraction caused by image degradation in complex weather scenarios. To address these challenges, we propose a normal weather scene-guided complex weather scene map construction network (NCMC-Net). First, we propose a scene feature sharing module based on the complement learning mechanism. This module alternately updates the shared feature pool with features from normal and complex weather scenes during two training stages. Second, we design a feature refinement module based on the channel attention mechanism to adjust the intensity of multichannel features selectively. The two modules facilitate the extraction of optimal complex weather scene features guided by features from normal weather scenes. Experimental results on the complex weather scene dataset nuScenes-W validate our model’s performance.

TMLR Journal 2025 Journal Article

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
Xianfeng Tang
Xinya Du
Yuyin Zhou
Cihang Xie

This work explores two distinct approaches for enhancing reasoning abilities in Large Vision Language Models (LVLMs): supervised fine-tuning (SFT) and reinforcement learning (RL). To support the SFT approach, we curate a multimodal reasoning dataset with the complete reasoning trace guided by DeepSeek-R1. For the RL approach, we focus on GRPO and develop a training framework tailored to vision-language tasks with a composite reward system comprising four signals that address both visual perception and reasoning challenges. Our extensive experiments reveal that RL is a significantly more effective strategy than SFT for training reasoning VLMs. While SFT can assist models that initially struggle with following reasoning instructions, it often induces ``pseudo aha moments'' that degrade overall reasoning performance, implying that only a minimal amount of SFT data is necessary. In contrast, RL leads to substantial improvements, outperforming recent baseline models on a range of math reasoning tasks by at least 2% on average. We also present several intriguing findings --- \eg, combining SFT and GRPO also hurts the model performance, and stronger instruction-aligned LVLMs consistently lead to better results in RL. We hope these findings provide valuable insights into the development of reasoning-capable VLMs and guide future research in this area.

EAAI Journal 2025 Journal Article

Time series anomaly detection based on time–frequency domain with masking strategy and contrastive learning

Zhengkai Wang
Hui Liu
Longjing Kuang
Xueliang Zhang
Xiude Chen
Junzhao Du

NeurIPS Conference 2025 Conference Paper

You Can Trust Your Clustering Model: A Parameter-free Self-Boosting Plug-in for Deep Clustering

Hanyang Li
Yuheng Jia
Hui Liu
Junhui Hou

Recent deep clustering models have produced impressive clustering performance. However, a common issue with existing methods is the disparity between global and local feature structures. While local structures typically show strong consistency and compactness within class samples, global features often present intertwined boundaries and poorly separated clusters. Motivated by this observation, we propose **DCBoost, a parameter-free plug-in** designed to enhance the global feature structures of current deep clustering models. By harnessing reliable local structural cues, our method aims to elevate clustering performance effectively. Specifically, we first identify high-confidence samples through adaptive $k$-nearest neighbors-based consistency filtering, aiming to select a sufficient number of samples with high label reliability to serve as trustworthy anchors for self-supervision. Subsequently, these samples are utilized to compute a discriminative loss, which promotes both intra-class compactness and inter-class separability, to guide network optimization. Extensive experiments across various benchmark datasets showcase that our DCBoost significantly improves the clustering performance of diverse existing deep clustering models. Notably, our method improves the performance of current state-of-the-art baselines (e. g. , ProPos) by more than 3\% and amplifies the silhouette coefficient by over $7\times$. **Code is available at [https: //github. com/l-h-y168/DCBoost](https: //github. com/l-h-y168/DCBoost). **

JBHI Journal 2024 Journal Article

Cross-Domain Feature Disentanglement for Interpretable Modeling of Tumor Microenvironment Impact on Drug Response

Jia Zhai
Hui Liu

High-throughput screening technology has enabled the generation of large-scale drug responses across hundreds of cancer cell lines. There remains a significant gap between in vitro cell lines and actual tumors in vivo in terms of their response to drug treatments yet. This is because tumors consist of a complex cellular composition and histopathology structure, known as the tumor microenvironment (TME), which greatly impacts the drug cytotoxicity against tumor cells. To date, no study has focused on modeling the impact of the TME on clinical drug response. In this study, we postulated that the intricate complexity of an actual tumor can be conceptually simplified into two separable components: cancerous cells and the tumor microenvironment. This assumption allowed us to model the influence of these two constituent parts on drug response through feature disentanglement. We employed a domain adaptation network to decouple and extract features from tumor transcriptional profiles. Specifically, two denoising autoencoders were separately used to extract features from cell lines (source domain) and tumors (target domain) for partial domain alignment and feature decoupling. The private encoder was enforced to extract information only about the TME. Moreover, to ensure generalizability to novel drugs, we employed a graph attention network to learn the latent representation of drugs, enabling us to linearly model the drug perturbation on cellular state in latent space. We validated our model on a benchmark dataset and demonstrated its superior performance in predicting clinical drug response and dissecting the influence of the TME on drug efficacy.

NeurIPS Conference 2024 Conference Paper

Mixture of Link Predictors on Graphs

Li Ma
Haoyu Han
Juanhui Li
Harry Shomer
Hui Liu
Xiaofeng Gao
Jiliang Tang

Link prediction, which aims to forecast unseen connections in graphs, is a fundamental task in graph machine learning. Heuristic methods, leveraging a range of different pairwise measures such as common neighbors and shortest paths, often rival the performance of vanilla Graph Neural Networks (GNNs). Therefore, recent advancements in GNNs for link prediction (GNN4LP) have primarily focused on integrating one or a few types of pairwise information. In this work, we reveal that different node pairs within the same dataset necessitate varied pairwise information for accurate prediction and models that only apply the same pairwise information uniformly could achieve suboptimal performance. As a result, we propose a simple mixture of experts model Link-MoE for link prediction. Link-MoE utilizes various GNNs as experts and strategically selects the appropriate expert for each node pair based on various types of pairwise information. Experimental results across diverse real-world datasets demonstrate substantial performance improvement from Link-MoE. Notably, Link-Mo achieves a relative improvement of 18. 71% on the MRR metric for the Pubmed dataset and 9. 59% on the Hits@100 metric for the ogbl-ppa dataset, compared to the best baselines. The code is available at https: //github. com/ml-ml/Link-MoE/.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Modeling Selective Feature Attention for Lightweight Text Matching

Jianxiang Zang
Hui Liu

Representation-based Siamese networks have risen to popularity in lightweight text matching due to their low deployment and inference costs. While word-level attention mechanisms have been implemented within Siamese networks to improve performance, we propose Feature Attention (FA), a novel downstream block designed to enrich the modeling of dependencies among embedding features. Employing "squeeze-and-excitation" techniques, the FA block dynamically adjusts the emphasis on individual features, enabling the network to concentrate more on features that significantly contribute to the final classification. Building upon FA, we introduce a dynamic "selection" mechanism called Selective Feature Attention (SFA), which leverages a stacked BiGRU Inception structure. The SFA block facilitates multi-scale semantic extraction by traversing different stacked BiGRU layers, encouraging the network to selectively concentrate on semantic information and embedding features across varying levels of abstraction. Both the FA and SFA blocks offer a seamless integration capability with various Siamese networks, showcasing a plug-and-play characteristic. Experimental evaluations conducted across diverse text matching baselines and benchmarks underscore the indispensability of modeling feature attention and the superiority of the "selection" mechanism.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference

Kendong Liu
Zhiyu Zhu
Chuanhao Li
Hui Liu
Huanqiang Zeng
Junhui Hou

In this paper, we make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework, significantly improving the quality and visual appeal of inpainted images. Specifically, instead of directly measuring the divergence with paired images, we train a reward model with the dataset we construct, consisting of nearly 51, 000 images annotated with human preferences. Then, we adopt a reinforcement learning process to fine-tune the distribution of a pre-trained diffusion model for image inpainting in the direction of higher reward. Moreover, we theoretically deduce the upper bound on the error of the reward model, which illustrates the potential confidence of reward estimation throughout the reinforcement alignment process, thereby facilitating accurate regularization. Extensive experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach, showing significant improvements in the alignment of inpainted images with human preference compared with state-of-the-art methods. This research not only advances the field of image inpainting but also provides a framework for incorporating human preference into the iterative refinement of generative models based on modeling reward accuracy, with broad implications for the design of visually driven AI applications. Our code and dataset are publicly available at \url{https: //prefpaint. github. io}.

PDF Details DOI

AAAI Conference 2024 Conference Paper

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering

Lei Wang
Yi Hu
Jiabang He
Xing Xu
Ning Liu
Hui Liu
Heng Tao Shen

Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5%. The code is publicly available at https://github.com/T-SciQ/T-SciQ.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

Zhikai Chen
Haitao Mao
Jingzhe Liu
Yu Song
Bingheng Li
Wei Jin
Bahare Fatemi
Anton Tsitsulin

Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models that align different modalities with natural language, the text has recently been adopted to provide a unified feature space for diverse graphs. Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods' full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. Empirical results provide new insights and inspire future research directions. Our code and data are publicly available from https: //github. com/CurryTang/TSGFM.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Towards a Novel Soft Magnetic Laparoscope for Single Incision Laparoscopic Surgery

Hui Liu
Ning Li
Shuai Li 0018
Gregory J. Mancini
Jindong Tan

In single-incision laparoscopic surgery (SILS), magnetic anchoring and guidance system (MAGS) is a promising technique to prevent clutter in the surgical workspace and provide a larger vision field. Existing camera designs mainly rely on rigid structure design, resulting in risks of losing magnetic coupling and impacting tissue during the insertion and coupling procedure. In this paper, we proposed a wireless MAGS consisting of soft material and structure design. The camera can bend at the exit of the trocar and maintain strong coupling with the external actuator. The operation principle and modeling were established to investigate the parameter design. An easier insertion procedure was introduced and demonstrated in the experiment. The bendability was tested showing the camera could reach 20° in bending angle and 16. 4mm in displacement. The insertion and deployment took less than 2 minutes on average.

AAAI Conference 2023 Conference Paper

Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models

Lei Wang
Jiabang He
Xing Xu
Ning Liu
Hui Liu

Alignment between image and text has shown promising improvements on patch-level pre-trained document image models. However, investigating more effective or finer-grained alignment techniques during pre-training requires a large amount of computation cost and time. Thus, a question naturally arises: Could we fine-tune the pre-trained models adaptive to downstream tasks with alignment objectives and achieve comparable or better performance? In this paper, we propose a new model architecture with alignment-enriched tuning (dubbed AETNet) upon pre-trained document image models, to adapt downstream tasks with the joint task-specific supervised and alignment-aware contrastive objective. Specifically, we introduce an extra visual transformer as the alignment-ware image encoder and an extra text transformer as the alignment-ware text encoder before multimodal fusion. We consider alignment in the following three aspects: 1) document-level alignment by leveraging the cross-modal and intra-modal contrastive loss; 2) global-local alignment for modeling localized and structural information in document images; and 3) local-level alignment for more accurate patch-level information. Experiments on various downstream tasks show that AETNet can achieve state-of-the-art performance on various downstream tasks. Notably, AETNet consistently outperforms state-of-the-art pre-trained models, such as LayoutLMv3 with fine-tuning techniques, on three different downstream tasks. Code is available at https://github.com/MAEHCM/AET.

PDF Details DOI

EAAI Journal 2023 Journal Article

CLformer: Locally grouped auto-correlation and convolutional transformer for long-term multivariate time series forecasting

Xingyu Wang
Hui Liu
Junzhao Du
Zhihan Yang
Xiyao Dong

JBHI Journal 2023 Journal Article

FGSQA-Net: A Weakly Supervised Approach to Fine-Grained Electrocardiogram Signal Quality Assessment

Hui Liu
Tianlei Gao
Zhaoyang Liu
Minglei Shu

Objective: Due to the lack of fine-grained labels, current research can only evaluate the signal quality at a coarse scale. This article proposes a weakly supervised fine-grained electrocardiogram (ECG) signal quality assessment method, which can produce continuous segment-level quality scores with only coarse labels. Methods: A novel network architecture, i. e. FGSQA-Net, is developed for signal quality assessment, which consists of a feature shrinking module and a feature aggregation module. Multiple feature shrinking blocks, which combine residual CNN block and max pooling layer, are stacked to produce a feature map corresponding to continuous segments along the spatial dimension. Segment-level quality scores are obtained by feature aggregation along the channel dimension. Results: The proposed method was evaluated on two real-world ECG databases and one synthetic dataset. Our method produced an average AUC value of 0. 975, which outperforms the state-of-the-art beat-by-beat quality assessment method. The results are visualized for 12-lead and single-lead signals over a granularity from 0. 64 to 1. 7 seconds, demonstrating that high-quality and low-quality segments can be effectively distinguished at a fine scale. Conclusion: FGSQA-Net is flexible and effective for fine-grained quality assessment for various ECG recordings and is suitable for ECG monitoring using wearable devices. Significance: This is the first study on fine-grained ECG quality assessment using weak labels and can be generalized to similar tasks for other physiological signals.

IJCAI Conference 2023 Conference Paper

Generative Diffusion Models on Graphs: Methods and Applications

Chengyi Liu
Wenqi Fan
Yunqing Liu
Jiatong Li
Hang Li
Hui Liu
Jiliang Tang
Qing Li

Diffusion models, as a novel generative paradigm, have achieved remarkable success in various image generation tasks such as image inpainting, image-to-text translation, and video generation. Graph generation is a crucial computational task on graphs with numerous real-world applications. It aims to learn the distribution of given graphs and then generate new graphs. Given the great success of diffusion models in image generation, increasing efforts have been made to leverage these techniques to advance graph generation in recent years. In this paper, we first provide a comprehensive overview of generative diffusion models on graphs, In particular, we review representative algorithms for three variants of graph diffusion models, i. e. , Score Matching with Langevin Dynamics (SMLD), Denoising Diffusion Probabilistic Model (DDPM), and Score-based Generative Model (SGM). Then, we summarize the major applications of generative diffusion models on graphs with a specific focus on molecule and protein modeling. Finally, we discuss promising directions in generative diffusion models on graph-structured data.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Global Structure-Aware Diffusion Process for Low-light Image Enhancement

Jinhui HOU
Zhiyu Zhu
Junhui Hou
Hui Liu
Huanqiang Zeng
Hui Yuan

This paper studies a diffusion-based framework to address the low-light image enhancement problem. To harness the capabilities of diffusion models, we delve into this intricate process and advocate for the regularization of its inherent ODE-trajectory. To be specific, inspired by the recent research that low curvature ODE-trajectory results in a stable and effective diffusion process, we formulate a curvature regularization term anchored in the intrinsic non-local structures of image data, i. e. , global structure-aware regularization, which gradually facilitates the preservation of complicated details and the augmentation of contrast during the diffusion process. This incorporation mitigates the adverse effects of noise and artifacts resulting from the diffusion process, leading to a more precise and flexible enhancement. To additionally promote learning in challenging regions, we introduce an uncertainty-guided regularization technique, which wisely relaxes constraints on the most extreme regions of the image. Experimental evaluations reveal that the proposed diffusion-based framework, complemented by rank-informed regularization, attains distinguished performance in low-light enhancement. The outcomes indicate substantial advancements in image quality, noise suppression, and contrast amplification in comparison with state-of-the-art methods. We believe this innovative approach will stimulate further exploration and advancement in low-light image processing, with potential implications for other applications of diffusion models. The code is publicly available at https: //github. com/jinnh/GSAD.

JBHI Journal 2022 Journal Article

An ECG Signal Denoising Method Using Conditional Generative Adversarial Net

Xiaoyu Wang
Bingchu Chen
Ming Zeng
Yuli Wang
Hui Liu
Ruixia Liu
Lan Tian
Xiaoshan Lu

In this paper, a novel denoising method for electrocardiogram (ECG) signal is proposed to improve performance and availability under multiple noise cases. The method is based on the framework of conditional generative adversarial network (CGAN), and we improved the CGAN framework for ECG denoising. The proposed framework consists of two networks: a generator that is composed of the optimized convolutional auto-encoder (CAE) and a discriminator that is composed of four convolution layers and one full connection layer. As the convolutional layers of CAE can preserve spatial locality and the neighborhood relations in the latent higher-level feature representations of ECG signal, and the skip connection facilitates the gradient propagation in the denoising training process, the trained denoising model has good performance and generalization ability. The extensive experimental results on MIT-BIH databases show that for single noise and mixed noises, the average signal-to-noise ratio (SNR) of denoised ECG signal is above 39 dB, and it is better than that of the state-of-the-art methods. Furthermore, the denoised classification results of four cardiac diseases show that the average accuracy increased above 32 $\%$ under multiple noises under SNR=0 dB. So, the proposed method can remove noise effectively as well as keep the details of the features of ECG signals.

AAAI Conference 2022 Conference Paper

Interpretable Low-Resource Legal Decision Making

Rohan Bhambhoria
Hui Liu
Samuel Dahan
Xiaodan Zhu

Over the past several years, legal applications of deep learning have been on the rise. However, as with other high-stakes decision making areas, the requirement for interpretability is of crucial importance. Current models utilized by legal practitioners are more of the conventional machine learning type, wherein they are inherently interpretable, yet unable to harness the performance capabilities of data-driven deep learning models. In this work, we utilize deep learning models in the area of trademark law to shed light on the issue of likelihood of confusion between trademarks. Specifically, we introduce a model-agnostic interpretable intermediate layer, a technique which proves to be effective for legal documents. Furthermore, we utilize weakly supervised learning by means of a curriculum learning strategy, effectively demonstrating the improved performance of a deep learning model. This is in contrast to the conventional models which are only able to utilize the limited number of expensive manually-annotated samples by legal experts. Although the methods presented in this work tackles the task of risk of confusion for trademarks, it is straightforward to extend them to other fields of law, or more generally, to other similar high-stakes application scenarios.

AAAI Conference 2022 Conference Paper

Multi-Type Urban Crime Prediction

Xiangyu Zhao
Wenqi Fan
Hui Liu
Jiliang Tang

Crime prediction plays an impactful role in enhancing public security and sustainable development of urban. With recent advances in data collection and integration technologies, a large amount of urban data with rich crime-related information and fine-grained spatio-temporal logs have been recorded. Such helpful information can boost our understandings of the temporal evolution and spatial factors of urban crimes and can enhance accurate crime prediction. However, the vast majority of existing crime prediction algorithms either do not distinguish different types of crime or treat each crime type separately, which fails to capture the intrinsic correlations among different types of crime. In this paper, we perform crime prediction exploiting the cross-type and spatio-temporal correlations of urban crimes. In particular, we verify the existence of correlations among different types of crime from temporal and spatial perspectives, and propose a coherent framework to mathematically model these correlations for crime prediction. Extensive experiments on realworld datasets validate the effectiveness of our framework.

AAAI Conference 2021 Conference Paper

Clustering Ensemble Meets Low-rank Tensor Approximation

Yuheng Jia
Hui Liu
Junhui Hou
Qingfu Zhang

This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one. The existing clustering ensemble methods generally construct a coassociation matrix, which indicates the pairwise similarity between samples, as the weighted linear combination of the connective matrices from different base clusterings, and the resulting co-association matrix is then adopted as the input of an off-the-shelf clustering algorithm, e. g. , spectral clustering. However, the co-association matrix may be dominated by poor base clusterings, resulting in inferior performance. In this paper, we propose a novel low-rank tensor approximation based method to solve the problem from a global perspective. Specifically, by inspecting whether two samples are clustered to an identical cluster under different base clusterings, we derive a coherent-link matrix, which contains limited but highly reliable relationships between samples. We then stack the coherent-link matrix and the co-association matrix to form a three-dimensional tensor, the low-rankness property of which is further explored to propagate the information of the coherent-link matrix to the co-association matrix, producing a refined co-association matrix. We formulate the proposed method as a convex constrained optimization problem and solve it efficiently. Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 11 state-of-the-art methods. To the best of our knowledge, this is the first work to explore the potential of low-rank tensor on clustering ensemble, which is fundamentally different from previous approaches. Last but not least, our method only contains one parameter, which can be easily tuned.

AAAI Conference 2021 Conference Paper

DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems

Xiangyu Zhao
Changsheng Gu
Haoshenglun Zhang
Xiwang Yang
Xiaobing Liu
Jiliang Tang
Hui Liu

With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in utilizing RL for online advertising in recommendation platforms (e. g. , ecommerce and news feed sites). However, most RL-based advertising algorithms focus on optimizing ads’ revenue while ignoring the possible negative influence of ads on user experience of recommended items (products, articles and videos). Developing an optimal advertising algorithm in recommendations faces immense challenges because interpolating ads improperly or too frequently may decrease user experience, while interpolating fewer ads will reduce the advertising revenue. Thus, in this paper, we propose a novel advertising strategy for the rec/ads trade-off. To be specific, we develop an RL-based framework that can continuously update its advertising strategies and maximize reward in the long run. Given a recommendation list, we design a novel Deep Qnetwork architecture that can determine three internally related tasks jointly, i. e. , (i) whether to interpolate an ad or not in the recommendation list, and if yes, (ii) the optimal ad and (iii) the optimal location to interpolate. The experimental results based on real-world data demonstrate the effectiveness of the proposed framework.

AAAI Conference 2021 Conference Paper

Learning Light-Weight Translation Models from Deep Transformer

Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8 times shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30. 63 on English-German newstest2014. The code is publicly available at https: //github. com/libeineu/GPKD.

EAAI Journal 2020 Journal Article

A review of machine learning for new generation smart dispatch in power systems

Linfei Yin
Qi Gao
Lulin Zhao
Bin Zhang
Tao Wang
Shengyuan Li
Hui Liu

IJCAI Conference 2020 Conference Paper

End-to-End Transition-Based Online Dialogue Disentanglement

Hui Liu
Zhan Shi
Jia-Chen Gu
Quan Liu
Si Wei
Xiaodan Zhu

Dialogue disentanglement aims to separate intermingled messages into detached sessions. The existing research focuses on two-step architectures, in which a model first retrieves the relationships between two messages and then divides the message stream into separate clusters. Almost all existing work puts significant efforts on selecting features for message-pair classification and clustering, while ignoring the semantic coherence within each session. In this paper, we introduce the first end-to- end transition-based model for online dialogue disentanglement. Our model captures the sequential information of each session as the online algorithm proceeds on processing a dialogue. The coherence in a session is hence modeled when messages are sequentially added into their best-matching sessions. Meanwhile, the research field still lacks data for studying end-to-end dialogue disentanglement, so we construct a large-scale dataset by extracting coherent dialogues from online movie scripts. We evaluate our model on both the dataset we developed and the publicly available Ubuntu IRC dataset [Kummerfeld et al. , 2019]. The results show that our model significantly outperforms the existing algorithms. Further experiments demonstrate that our model better captures the sequential semantics and obtains more coherent disentangled sessions.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Gated POS-Level Language Model for Authorship Verification

Linshu Ouyang
Yongzheng Zhang
Hui Liu
Yige Chen
Yipeng Wang

Authorship verification is an important problem that has many applications. The state-of-the-art deep authorship verification methods typically leverage character-level language models to encode author-specific writing styles. However, they often fail to capture syntactic level patterns, leading to sub-optimal accuracy in cross-topic scenarios. Also, due to imperfect cross-author parameter sharing, it's difficult for them to distinguish author-specific writing style from common patterns, leading to data-inefficient learning. This paper introduces a novel POS-level (Part of Speech) gated RNN based language model to effectively learn the author-specific syntactic styles. The author-agnostic syntactic information obtained from the POS tagger pre-trained on large external datasets greatly reduces the number of effective parameters of our model, enabling the model to learn accurate author-specific syntactic styles with limited training data. We also utilize a gated architecture to learn the common syntactic writing styles with a small set of shared parameters and let the author-specific parameters focus on each author's special syntactic styles. Extensive experimental results show that our method achieves significantly better accuracy than state-of-the-art competing methods, especially in cross-topic scenarios (over 5\% in terms of AUC-ROC).

PDF Details DOI

AAAI Conference 2020 Conference Paper

Joint Character-Level Word Embedding and Adversarial Stability Training to Defend Adversarial Text

Hui Liu
Yongzheng Zhang
Yipeng Wang
Zheng Lin
Yige Chen

Text classiﬁcation is a basic task in natural language processing, but the small character perturbations in words can greatly decrease the effectiveness of text classiﬁcation models, which is called character-level adversarial example attack. There are two main challenges in character-level adversarial examples defense, which are out-of-vocabulary words in word embedding model and the distribution difference between training and inference. Both of these two challenges make the character-level adversarial examples difﬁcult to defend. In this paper, we propose a framework which jointly uses the character embedding and the adversarial stability training to overcome these two challenges. Our experimental results on ﬁve text classiﬁcation data sets show that the models based on our framework can effectively defend character-level adversarial examples, and our models can defend 93. 19% gradientbased adversarial examples and 94. 83% natural adversarial examples, which outperforms the state-of-the-art defense models.

EAAI Journal 2020 Journal Article

Lazy reinforcement learning for real-time generation control of parallel cyber–physical–social energy systems

Linfei Yin
Shengyuan Li
Hui Liu

AAAI Conference 2020 Conference Paper

Learning Multi-Level Dependencies for Robust Word Recognition

Zhiwei Wang
Hui Liu
Jiliang Tang
Songfan Yang
Gale Yan Huang
Zitao Liu

Robust language processing systems are becoming increasingly important given the recent awareness of dangerous situations where brittle machine learning models can be easily broken with the presence of noises. In this paper, we introduce a robust word recognition framework that captures multi-level sequential dependencies in noised sentences. The proposed framework employs a sequence-to-sequence model over characters of each word, whose output is given to a word-level bi-directional recurrent neural network. We conduct extensive experiments to verify the effectiveness of the framework. The results show that the proposed framework outperforms state-of-the-art methods by a large margin and they also suggest that character-level dependencies can play an important role in word recognition. The code of the proposed framework and the major experiments are publicly available1.

TIST Journal 2020 Journal Article

Superpixel Region Merging Based on Deep Network for Medical Image Segmentation

Hui Liu
Haiou Wang
Yan Wu
Lei Xing

Automatic and accurate semantic segmentation of pathological structures in medical images is challenging because of noisy disturbance, deformable shapes of pathology, and low contrast between soft tissues. Classical superpixel-based classification algorithms suffer from edge leakage due to complexity and heterogeneity inherent in medical images. Therefore, we propose a deep U-Net with superpixel region merging processing incorporated for edge enhancement to facilitate and optimize segmentation. Our approach combines three innovations: (1) different from deep learning--based image segmentation, the segmentation evolved from superpixel region merging via U-Net training getting rich semantic information, in addition to gray similarity; (2) a bilateral filtering module was adopted at the beginning of the network to eliminate external noise and enhance soft tissue contrast at edges of pathogy; and (3) a normalization layer was inserted after the convolutional layer at each feature scale, to prevent overfitting and increase the sensitivity to model parameters. This model was validated on lung CT, brain MR, and coronary CT datasets, respectively. Different superpixel methods and cross validation show the effectiveness of this architecture. The hyperparameter settings were empirically explored to achieve a good trade-off between the performance and efficiency, where a four-layer network achieves the best result in precision, recall, F-measure, and running speed. It was demonstrated that our method outperformed state-of-the-art networks, including FCN-16s, SegNet, PSPNet, DeepLabv3, and traditional U-Net, both quantitatively and qualitatively. Source code for the complete method is available at https://github.com/Leahnawho/Superpixel-network.

IROS Conference 2019 Conference Paper

Towards A Generic In Vivo In Situ Camera Lens Cleaning Module for Laparoscopic Surgery

Xiaolong Liu 0002
Hui Liu
Jindong Tan

This paper proposes a generic cleaning module to address lens fogging and soiling problems for insertable robotic cameras in laparoscopic surgery. The proposed lens cleaning module features minimal intraoperative interruption for surgeons to maintain clear visual field. The technical challenges for developing such a compact modular design involve confining the wiping mechanism within small space, yet delivering sufficient energy for the lens cleaning task. Inspired by thermo-activated phase transformation of shape memory alloy, we develop an effective mesoscale actuation mechanism to overcome the design challenges. A prototype is designed and manufactured for performance evaluation. The design effectiveness was verified by experiments.

YNIMG Journal 2015 Journal Article

Preclinical evaluation of a promising C-11 labeled PET tracer for imaging phosphodiesterase 10A in the brain of living subject

Hui Liu
Hongjun Jin
Xuyi Yue
Xiang Zhang
Hao Yang
Junfeng Li
Hubert Flores
Yi Su

TCS Journal 2013 Journal Article

Approximation algorithms for minimum latency data aggregation in wireless sensor networks with directional antenna

Hui Liu
Zewen Liu
Deying Li
Xianling Lu
Hongwei Du

JBHI Journal 2013 Journal Article

Identifying Mammalian MicroRNA Targets Based on Supervised Distance Metric Learning

Hui Liu
Shuigene Zhou
Jihong Guan

MicroRNAs (miRNAs) have been emerged as a novel class of endogenous posttranscriptional regulators in a variety of animal and plant species. One challenge facing miRNA research is to accurately identify the target mRNAs, because of the very limited sequence complementarity between miRNAs and their target sites, and the scarcity of experimentally validated targets to guide accurate prediction. In this paper, we propose a new method called SuperMirTar that exploits $super$ vised distance learning to predict $miR$ NA $tar$ gets. Specifically, we use the experimentally supported miRNA–mRNA pairs as a training set to learn a distance metric function that minimizes the distances between miRNAs and mRNAs with validated interactions, then use the learned function to calculate the distances of test miRNA–mRNA interactions, and those with smaller distances than a predefined threshold are regarded as true interactions. We carry out performance comparison between the proposed approach and seven existing methods on independent datasets; the results show that our method achieves superior performance and can effectively narrow the gap between the number of predicted miRNA targets and the number of experimentally validated ones.

IROS Conference 2005 Conference Paper

A visually guided swimming robot

Gregory Dudek
Michael Jenkin
Chris Prahacs
Andrew Hogue
Junaed Sattar
Philippe Giguère
Andrew German
Hui Liu

We describe recent results obtained with AQUA, a mobile robot capable of swimming, walking and amphibious operation. Designed to rely primarily on visual sensors, the AQUA robot uses vision to navigate underwater using servo-based guidance, and also to obtain high-resolution range scans of its local environment. This paper describes some of the pragmatic and logistic obstacles encountered, and provides an overview of some of the basic capabilities of the vehicle and its associated sensors. Moreover, this paper presents the first ever amphibious transition from walking to swimming.

IROS Conference 2004 Conference Paper

AQUA: an aquatic walking robot

Christina Georgiades
Andrew German
Andrew Hogue
Hui Liu
Chris Prahacs
Arlene Ripsman
Robert Sim
Luz Abril Torres-Méndez

This paper describes an underwater walking robotic system being developed under the name AQUA, the goals of the AQUA project, the overall hardware and software design, the basic hardware and sensor packages that have been developed, and some initial experiments. The robot is based on the RHex hexapod robot and uses a suite of sensing technologies, primarily based on computer vision and INS, to allow it to navigate and map clear shallow-water environments. The sensor-based navigation and mapping algorithms are based on the use of both artificial floating visual and acoustic landmarks as well as on naturally occurring underwater landmarks and trinocular stereo.