Arrow Research search

Author name cluster

Yu Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

76 papers
2 author rows

Possible papers

76

AAAI Conference 2026 Conference Paper

AgentSwift: Efficient LLM Agent Design via Value-Guided Hierarchical Search

  • Yu Li
  • Lehui Li
  • Zhihao Wu
  • Qingmin Liao
  • Jianye Hao
  • Kun Shao
  • Fengli Xu

Large language model (LLM) agents have demonstrated strong capabilities across diverse domains, yet automated agent design remains a significant challenge. Current automated agent design approaches are often constrained by limited search spaces that primarily optimize workflows but fail to integrate crucial human-designed components like memory, planning, and tool use. Furthermore, these methods are hampered by high evaluation costs, as evaluating even a single new agent on a benchmark can require tens of dollars. The difficulty of this exploration is further exacerbated by inefficient search strategies that struggle to navigate the large design space effectively, making the discovery of novel agents a slow and resource-intensive process. To address these challenges, we propose AgentSwift, a novel framework for automated agent design. We formalize a hierarchical search space that jointly models agentic workflow and composable functional components. This structure moves beyond optimizing workflows alone by co-optimizing functional components, which enables the discovery of more complex and effective agent architectures. To make exploration within this expansive space feasible, we mitigate high evaluation costs by training a value model on a high-quality dataset, generated via a novel strategy combining combinatorial coverage and balanced Bayesian sampling for low-cost evaluation. Guiding the entire process is a hierarchical Monte Carlo Tree Search (MCTS) strategy, which is informed by uncertainty to efficiently navigate the search space. Evaluated across a comprehensive set of seven benchmarks spanning embodied, math, web, tool, and game domains, AgentSwift discovers agents that achieve an average performance gain of 8.34\% over both existing automated agent search methods and manually designed agents. Moreover, our framework exhibits steeper and more stable search trajectories. By enabling the efficient, automated composition of workflow with functional components, AgentSwift provides a scalable methodology to explore complex agent designs. Our framework serves as a launchpad for researchers to rapidly prototype and discover powerful agent architectures without the impediment of prohibitive evaluation costs.

AAAI Conference 2026 Conference Paper

Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection

  • Chenming Zhou
  • Jiaan Wang
  • Yu Li
  • Lei Li
  • Juan Cao
  • Sheng Tang

The rapid evolution of generative technologies necessitates reliable methods for detecting AI-generated images. A critical limitation of current detectors is their failure to generalize to images from unseen generative models, as they often overfit to source-specific semantic cues rather than learning universal generative artifacts. To overcome this, we introduce a simple yet remarkably effective pixel-level mapping pre-processing step to disrupt the pixel value distribution of images and break the fragile, non-essential semantic patterns that detectors commonly exploit as shortcuts. This forces the detector to focus on more fundamental and generalizable high-frequency traces inherent to the image generation process. Through comprehensive experiments on GAN and diffusion-based generators, we show that our approach significantly boosts the cross-generator performance of state-of-the-art detectors. Extensive analysis further verifies our hypothesis that the disruption of semantic cues is the key to generalization.

AAAI Conference 2026 Conference Paper

Calibrating and Rotating: A Unified Framework for Weight Conditioning in PEFT

  • Da Chang
  • Peng Xue
  • Yu Li
  • Yongxiang Liu
  • Pengxiang Xu
  • Shixun Zhang

Parameter-Efficient Fine-Tuning (PEFT) methods are crucial for adapting large pre-trained models. Among these, LoRA is considered a foundational approach. Building on this, the influential DoRA method enhances performance by decomposing weight updates into magnitude and direction. However, its underlying mechanism remains unclear, and it introduces significant computational overhead. In this work, we first identify that DoRA's success stems from its capacity to increase the singular value entropy of the weight update matrix, which promotes a more uniform update distribution akin to full fine-tuning. We then reformulate DoRA into a mathematically equivalent and more efficient matrix form, revealing it as a learnable weight conditioning method. Based on this insight, we propose a unified framework for designing advanced PEFT methods by exploring two orthogonal dimensions: the architectural placement and the transformation type of the conditioning matrix. Within this framework, we introduce two novel methods: (1) Pre-Diag, which applies a diagonal conditioning matrix before the LoRA update to efficiently calibrate the pre-trained weights, thereby enhancing performance while reducing training time; and (2) Skewed Orthogonal Rotation Adaptation (SORA), which employs a parameter-efficient orthogonal rotation to perform a more powerful, norm-preserving transformation of the feature space. Extensive experiments on natural language understanding and generation tasks demonstrate that our proposed methods achieve superior performance and efficiency compared to both LoRA and DoRA.

AAAI Conference 2026 Conference Paper

C³TG: Conflict-aware, Composite, and Collaborative Controlled Text Generation

  • Yu Li
  • Zhe Yang
  • Yi Huang
  • Xin Liu
  • Guilin Qi

Recent advancements in large language models (LLMs) have demonstrated remarkable text generation capabilities. However, controlling specific attributes of generated text remains challenging without architectural modifications or extensive fine-tuning. Current methods typically toggle a single, basic attribute but struggle with precise multi-attribute control. In scenarios where attribute requirements conflict, existing methods lack coordination mechanisms, causing interference between desired attributes. Furthermore, these methods fail to incorporate iterative optimization processes in the controlled generation pipeline. To address these limitations, we propose Conflict-aware, Composite, and Collaborative Controlled Text Generation (C³TG), a two-phase framework for fine-grained, multi-dimensional text attribute control. During generation, C³TG selectively pairs the LLM with the required attribute classifiers from the 17 available dimensions and employs weighted KL-divergence to adjust token probabilities. The optimization phase then leverages an energy function combining classifier scores and penalty terms to resolve attribute conflicts through iterative feedback, enabling precise control over multiple dimensions simultaneously while preserving natural text flow. Experiments show that C³TG significantly outperforms baselines across multiple metrics including attribute accuracy, linguistic fluency, and output diversity, while simultaneously reducing toxicity. These results establish C³TG as an effective and flexible solution for multi-dimensional text attribute control that requires no costly model modifications.

AAAI Conference 2026 Conference Paper

DS-ProGen: A Dual-Structure Deep Language Model for Functional Protein Design

  • Yanting Li
  • Zikang Wang
  • Jiyue Jiang
  • Ziqian Lin
  • Dongchen He
  • Yuheng Shan
  • Yanruisheng Shao
  • Jiayi Li

Inverse Protein Folding (IPF) is a critical subtask in the field of protein design, aiming to engineer amino acid sequences capable of folding correctly into a specified three-dimensional (3D) conformation. Although substantial progress has been achieved in recent years, existing methods generally rely on either backbone coordinates or molecular surface features alone, which restricts their ability to fully capture the complex chemical and geometric constraints necessary for precise sequence prediction. To address this limitation, we present DS-ProGen, a dual-structure deep language model for functional protein design, which integrates both backbone geometry and surface-level representations. By incorporating backbone coordinates as well as surface chemical and geometric descriptors into a next-amino-acid prediction paradigm, DS-ProGen is able to generate functionally relevant and structurally stable sequences while satisfying both global and local conformational constraints. On the PRIDE dataset, DS-ProGen attains the current state-of-the-art recovery rate of 61.47%, demonstrating the synergistic advantage of multi-modal structural encoding in protein design. Furthermore, DS-ProGen excels in predicting interactions with a variety of biological partners, including ligands, ions, and RNA, confirming its robust functional retention capabilities.

AAAI Conference 2026 Conference Paper

Investigating Data Pruning for Pretraining Biological Foundation Models at Scale

  • Yifan Wu
  • Jiyue Jiang
  • Xichen Ye
  • Yiqi Wang
  • Chang Zhou
  • Yitao Xu
  • Jiayang Chen
  • He Hu

Biological foundation models (BioFMs), pretrained on large-scale biological sequences, have recently shown strong potential in providing meaningful representations for diverse downstream bioinformatics tasks. However, such models often rely on millions to billions of training sequences and billions of parameters, resulting in prohibitive computational costs and significant barriers to reproducibility and accessibility—particularly for academic labs. To address these challenges, we investigate the feasibility of data pruning for BioFM pretraining and propose a post-hoc influence-guided data pruning framework tailored to biological domains. Our approach first introduces a subset-based self-influence formulation that enables efficient estimation of sample importance at low computational cost. Built upon this, we propose two simple yet effective selection strategies: Top-k Influence (Top I) and Coverage-Centric Influence (CCI). Then, we empirically validate our method on two representative BioFMs: RNA-FM and ESM-C. For RNA, our framework consistently outperforms random selection baselines under an extreme pruning rate of over 99%, which displays our framework's effectiveness. Furthermore, we demonstrate the generalizability of our framework on protein-related tasks using ESM-C. Specifically, our coreset even outperforms random 10x subsets in both RNA and protein settings, revealing substantial redundancy in biological sequence datasets. These findings underscore the potential of influence-guided data pruning to substantially reduce the computational cost of BioFM pretraining, paving the way for more efficient, accessible, and sustainable biological AI research.

AAAI Conference 2026 Conference Paper

LinProVSR: Linguistics-Knowledge Guided Progressive Disambiguation Network for Visual Speech Recognition

  • Feng Xue
  • Baochao Zhu
  • Wei Jia
  • Shujie Li
  • Yu Li
  • Jinrui Zhang
  • Shengeng Tang
  • Dan Guo

Visual Speech Recognition (VSR), commonly known as lipreading, enables the recognition of spoken text by analyzing lip visual features. Due to the subtlety of lip movements, its recognition is much harder than other motion recognition tasks. Existing VSR models face the challenge of viseme ambiguity when processing phonemes with similar pronunciations—multiple phonemes share similar viseme features, leading to a notable drop in lipreading accuracy. To address this issue, this study proposes a Linguistics-Knowledge Guided Progressive Disambiguation Network for Visual Speech Recognition(LinProVSR) framework. First, an ambiguous sample set is constructed based on linguistic knowledge to provide supervisory signals for the model's training. Then, a Progressive Contrastive Disambiguation Network (PCDN) is designed, which progressively enhances the model's ability to capture the subtle viseme differences corresponding to similar phonemes through viseme-phoneme contrastive disambiguation in the encoding stage and text contrastive disambiguation in the decoding stage. Furthermore, we pioneer the Ambiguous Word Error Rate (AWER) metric specifically for evaluating recognition of phonetically ambiguous text, and verify the effectiveness of the proposed method on multiple public datasets, achieving a significant breakthrough especially in distinguishing visually similar phonemes.

TCS Journal 2026 Journal Article

Matchmaking encryption for NC1 circuits without obfuscation

  • Ying Gao
  • Xinrui Yang
  • Jie Chen
  • Yijian Zhang
  • Yu Li

Matchmaking encryption (ME) is a new form of encryption proposed by Ateniese et al. (CRYPTO, 2019). Constructing an ME scheme that supports complex functions without relying on obfuscation is an important area of research, but it has seen limited success despite significant effort. Existing ME schemes either focus on very restricted policies (i. e. , for identity matching), or require obfuscation techniques. In this paper, we propose the first ME construction that supports NC 1 circuits without using obfuscation. Our results can be summarized as follows. (1) We propose an ME scheme for NC 1 circuits from LWE and pairings, with provable security in the generic group model (GGM). (2) We further propose an ME scheme for NC 1 circuits in the standard model, by leveraging inner product functional encryption and using the KOALA knowledge assumption. Technically, we follow the blueprint of Francati et al. (Eurocrypt, 2023) but start from the two-input attribute-based encryption by Agrawal et al. (CRYPTO, 2022), which allows for a form of “linking” between two independently generated ciphertexts. In terms of security, our schemes protect the sender’s privacy, prove the authenticity of sender data, and ensure that receivers without access privileges remain uninformed about any information.

JBHI Journal 2026 Journal Article

MRGCDDI: Multi-Relation Graph Contrastive Learning Without Data Augmentation for Drug-Drug Interaction Events Prediction

  • Yu Li
  • Lin-Xuan Hou
  • Zhu-Hong You
  • Yang Yuan
  • Cheng-gang Mi
  • Yu-An Huang
  • Hai-Cheng Yi

Predicting drug-drug interactions (DDIs) is a significant concern in the field of deep learning. It can effectively reduce potential adverse consequences and improve therapeutic safety. Graph neural network (GNN)-based models have made satisfactory progress in DDI event prediction. However, most existing models overlook crucial drug structure and interaction information, which is necessary for accurate DDI event prediction. To tackle this issue, we introduce a new method called MRGCDDI. This approach employs contrastive learning, but unlike conventional methods, it does not require data augmentation, thereby avoiding additional noise. MRGCDDI maintains the semantics of the graphical data during encoder perturbation through a simple yet effective contrastive learning approach, without the need for manual trial and error, tedious searching, or expensive domain knowledge to select enhancements. The approach presented in this study effectively integrates drug features extracted from drug molecular graphs and information from multi-relational drug-drug interaction (DDI) networks. Extensive experimental results demonstrate that MRGCDDI outperforms state-of-the-art methods on both datasets. Specifically, on Deng's dataset, MRGCDDI achieves an average increase of 4. 33% in accuracy, 11. 57% in Macro-F1, 10. 97% in Macro-Recall, and 10. 64% in Macro-Precision. Similarly, on Ryu's dataset, the model shows improvements with an average increase of 2. 42% in accuracy, 3. 86% in Macro-F1, 3. 49% in Macro-Recall, and 2. 75% in Macro-Precision.

AAAI Conference 2026 Conference Paper

MultiKD: Backdoor Defense in Federated Graph Learning via Attention-Guided Multi-Teacher Distillation

  • Jiale Zhang
  • Yanan Wang
  • Bosen Rao
  • Chengcheng Zhu
  • Xiaobing Sun
  • Yu Li

Backdoor attacks pose a severe threat to federated graph learning (FGL), where malicious clients can inject hidden triggers into the global model without being detected. Defending against such attacks is particularly challenging due to the complex graph structures and the stealthy nature of trigger patterns. In this work, we propose MultiKD, a novel backdoor mitigation method based on attention-guided multi-teacher distillation. Unlike existing defenses that focus on detecting suspicious clients or restricting backdoor activation, MultiKD directly purifies the global model on the server side by exploiting intermediate representations. It integrates knowledge from multiple client models and guides the global model to suppress backdoor behaviors by aligning attention maps and preserving inter-layer relational consistency. Our defensive intuition enables MultiKD to retain task-relevant information while mitigating malicious patterns, even when some teacher models are compromised. Extensive experiments on four real-world datasets demonstrate the effectiveness of our approach in significantly reducing attack success rate (≤ 8%) with minimal impact on utility (≤ 5%).

AAAI Conference 2026 Conference Paper

RMSAGen: Integrating Multiple Sequence Alignment for Function RNA Design

  • Jiyue Jiang
  • Yanyu Chen
  • Qingchuan Zhang
  • Jiayi Li
  • Xiangyu Shi
  • Chang Zhou
  • Ziqian Lin
  • Jiuming Wang

Biological sequences, including RNAs and proteins, share similarities with natural languages, enabling the application of advanced language models to various biological tasks. However, due to its flexibility and lack of experimental data, RNA is a particularly challenging biological ``language'' compared to other biological sequences like proteins. RNA multiple sequence alignments (MSAs), which align evolutionarily related RNA sequences, can greatly enhance RNA biology modeling, as evidenced by their significant roles in structure prediction and function annotation. This raises the question of whether RNA MSAs can also benefit RNA design, which remains unexplored. This paper introduces RMSAGen, a model comprising RMSA-Encoder and RMSA-Decoder, that leverages MSAs to design functional RNA sequences. RMSA-Encoder effectively extracts MSA features, enhancing performance in functional prediction and solvent accessibility prediction tasks and supporting RMSA-Decoder in accurate RNA generation. RMSAGen can design RNA sequences that effectively bind to target RNA-binding proteins, and the design performance improves with an increasing number of sequences. In addition, the ribozymes designed with structural features by RMSAGen show strong computational metrics and exhibit biological activity during gel electrophoresis. These results highlight the effectiveness of RMSAGen, establishing it as a powerful tool and a new direction for RNA design.

EAAI Journal 2026 Journal Article

Smart indoor occupancy detection based on optimized camera placement, multi-view de-duplication, and large language model semantic understanding

  • Deli Liu
  • Xiaoping Zhou
  • Dongxiao Chen
  • Yu Li

Accurate occupancy detection in indoor environments is essential for optimizing energy use, enhancing occupant comfort, and ensuring safety in smart buildings. This study aims to design and validate an end-to-end framework that not only counts occupants reliably but also generates rich semantic descriptions of their behaviors and spatial interactions. We propose a four-stage methodology: (1) multi-objective optimization of camera placement through field-of-view analysis and grid modeling to maximize coverage and minimize blind spots; (2) on-device human detection using a fine-tuned You Only Look Once version 8 (YOLOv8) model; (3) cross-camera identity tracking using Simple Online and Realtime Tracking with a Deep Association Metric (DeepSORT) to assign unique global identifiers and eliminate duplicate counts; and (4) a multimodal large language model (LLM) that consumes annotated, identity-aware multi-view images to produce coherent natural-language summaries and structured outputs detailing occupant numbers, actions, and locations. Extensive evaluations conducted on a diverse multi-view dataset—including challenging scenarios of heavy occlusion and clothing changes—demonstrate the robustness and real-time applicability of the proposed framework. The key contribution of this work is the first demonstration of integrating identity-aware, multi-camera de-duplication with large language model–driven scene interpretation, enabling automated, actionable insights that extend beyond simple occupancy counts. This novel combination advances intelligent building management by providing both precise occupancy analytics and contextual understanding to support adaptive control and energy-efficient operation.

TMLR Journal 2025 Journal Article

AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

  • Yihang Gao
  • Chuanyang Zheng
  • Enze Xie
  • Han Shi
  • Tianyang Hu
  • Yu Li
  • Michael Ng
  • Zhenguo Li

Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer (Yang et al., 2024; Giannou et al., 2023), we design a novel transformer framework, dubbed Algorithm Transformer (abbreviated as AlgoFormer). We provide an insight that efficient transformer architectures can be designed by leveraging prior knowledge of tasks and the underlying structure of potential algorithms. Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can perform efficiently in algorithm representation in some specific tasks. In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some specific tasks. An extensive experiment on real language tasks (e.g., neural machine translation of German and English, and text classification) further validates the expressiveness and effectiveness of AlgoFormer.

AAAI Conference 2025 Conference Paper

AnyTalk: Multi-modal Driven Multi-domain Talking Head Generation

  • Yu Wang
  • Yunfei Liu
  • Fa-Ting Hong
  • Meng Cao
  • Lijian Lin
  • Yu Li

Cross-domain talking head generation, such as animating a static cartoon animal photo with real human video, is crucial for personalized content creation. However, prior works typically rely on domain-specific frameworks and paired videos, limiting its utility and complicating its architecture with additional motion alignment modules. Addressing these shortcomings, we propose Anytalk, a unified framework that eliminates the need for paired data and learns a shared motion representation across different domains. The motion is represented by canonical 3D keypoints extracted using an unsupervised 3D keypoint detector. Further, we propose an expression consistency loss to improve the accuracy of facial dynamics in video generation. Additionally, we present AniTalk, a comprehensive dataset designed for advanced multi-modal cross-domain generation. Our experiments demonstrate that Anytalk excels at generating high-quality, multi-modal talking head videos, showcasing remarkable generalization capabilities across diverse domains.

NeurIPS Conference 2025 Conference Paper

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

  • Li Hao
  • He CAO
  • Bin Feng
  • Daniel Shao
  • Robert Tang
  • Zhiyuan Yan
  • Yonghong Tian
  • Li Yuan

While large language models (LLMs) with Chain-of-Thought (CoT) reasoning excel in mathematics and coding, their potential for systematic reasoning in chemistry, a domain demanding rigorous structural analysis for real-world tasks like drug design and reaction engineering, remains untapped. Current benchmarks focus on simple knowledge retrieval, neglecting step-by-step reasoning required for complex tasks such as molecular optimization and reaction prediction. To address this, we introduce ChemCoTBench, a reasoning framework that bridges molecular structure understanding with arithmetic-inspired operations, including addition, deletion, and substitution, to formalize chemical problem-solving into transparent, step-by-step workflows. By treating molecular transformations as modular "chemical operations", the framework enables slow-thinking reasoning, mirroring the logic of mathematical proofs while grounding solutions in real-world chemical constraints. We evaluate models on two high-impact tasks: Molecular Property Optimization and Chemical Reaction Prediction. These tasks mirror real-world challenges while providing structured evaluability. We further provide ChemCoTDataset, a pioneering 22, 000-instance chemical reasoning dataset with expert-annotated chains of thought to facilitate LLM fine-tuning. By providing annotated trainable datasets, a reasoning taxonomy, and baseline evaluations, our work bridges the gap between abstract reasoning methods and practical chemical discovery, establishing a foundation for advancing LLMs as tools for AI-driven scientific innovation.

TAAS Journal 2025 Journal Article

Blockchain-Based Efficient Cross-Domain Access Control System for Autonomous Vehicle Data Sharing

  • Youhuizi Li
  • Yuyu Yin
  • Honghao Gao
  • Hao Lei
  • Zhilei Li
  • Yu Li

Autonomous driving requires the cooperation between vehicles and vehicle to infrastructures, a substantial amount of data is continuously being generated and shared to support intelligent transportation. It is necessary to apply strict and flexible access control mechanisms to protect data privacy in such complex scenarios. However, traditional centralized architecture cannot efficiently handle the usage and authorization of autonomous vehicle data. To prioritize the data sharing among multiple companies that do not fully trust each other, we propose a blockchain-based efficient cross-domain access control system named BAAC. It integrates the attribute-based access control model and blockchain for secure and flexible data sharing. Firstly, entity attributes are divided into public attributes and private attributes to provide fine-grained access control. Besides, several smart contracts are designed to efficiently distribute and manage cross-domain attributes, and the data access history is recorded asynchronously on the blockchain. Due to the instability of the real network, a dynamic routing mechanism that distributes data access requests to different blockchain nodes is designed to further improve system efficiency. Compared with classical blockchain-based access control systems, the experimental results show that the proposed system can solve the trust issue and well-support cross-domain data sharing.

NeurIPS Conference 2025 Conference Paper

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

  • Tengyun Ma
  • Jiaqi Yao
  • Daojing He
  • Shihao Peng
  • Yu Li
  • Shaohui Liu
  • Zhuotao Tian

Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when exposed to adversarial scenarios. In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior. To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. To address these vulnerabilities, we introduce Context-Aware Hierarchical Learning (CAHL), a sophisticated mechanism that dynamically equilibrates semantic comprehension with role-specific instruction constraints. CAHL leverages the contextual correlations between different instruction segments to establish a robust, context-aware instruction hierarchy. Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. Our code is available at https: //github. com/S2AILab/CAHL.

AAAI Conference 2025 Conference Paper

DF-MIA: A Distribution-Free Membership Inference Attack on Fine-Tuned Large Language Models

  • Zhiheng Huang
  • Yannan Liu
  • Daojing He
  • Yu Li

Membership Inference Attack (MIA) aims to determine if a specific sample is present in the training dataset of a target machine learning model. Previous MIAs against fine-tuned Large Language Models (LLMs) either fail to address the unique challenges in the fine-tuned setting or rely on strong assumption of the training data distribution. This paper proposes a distribution-free MIA framework tailored for fine-tuned LLMs, named DF-MIA. We recognize that samples await to test can serve as a valuable reference dataset for fine-tuning reference models. By enhancing the signals of non-member samples within this reference dataset, we can achieve a more reliable and practical calibration of probabilities, improving the differentiation between members and non-members. Leveraging these insights, we have developed a two-stage framework that employs specially designed data augmentation and perturbation techniques to prioritize the significance of non-members and mitigate the influence of potential members within the reference dataset. We evaluate our method on three representative LLM models ranging from 1B to 8B on three datasets. The results demonstrate that the DF-MIA significantly enhances the performance of MIA.

NeurIPS Conference 2025 Conference Paper

Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

  • Yu Li
  • Xingyu Qiu
  • Yuqian Fu
  • Jie Chen
  • Tianwen Qian
  • Xu Zheng
  • Danda Pani Paudel
  • Yanwei Fu

Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to detect novel objects with only a handful of labeled samples from previously unseen domains. While data augmentation and generative methods have shown promise in few-shot learning, their effectiveness for CD-FSOD remains unclear due to the need for both visual realism and domain alignment. Existing strategies, such as copy-paste augmentation and text-to-image generation, often fail to preserve the correct object category or produce backgrounds coherent with the target domain, making them non-trivial to apply directly to CD-FSOD. To address these challenges, we propose Domain-RAG, a training-free, retrieval-guided compositional image generation framework tailored for CD-FSOD. Domain-RAG consists of three stages: domain-aware background retrieval, domain-guided background generation, and foreground-background composition. Specifically, the input image is first decomposed into foreground and background regions. We then retrieve semantically and stylistically similar images to guide a generative model in synthesizing a new background, conditioned on both the original and retrieved contexts. Finally, the preserved foreground is composed with the newly generated domain-aligned background to form the generated image. Without requiring any additional supervision or training, Domain-RAG produces high-quality, domain-consistent samples across diverse tasks, including CD-FSOD, remote sensing FSOD, and camouflaged FSOD. Extensive experiments show consistent improvements over strong baselines and establish new state-of-the-art results. Codes will be released upon acceptance. The source code and instructions are available at https: //github. com/LiYu0524/Domain-RAG.

NeurIPS Conference 2025 Conference Paper

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

  • Tianhao Chen
  • Xin Xu
  • Zijing Liu
  • Pengxiang Li
  • Xinyuan Song
  • AJAY JAISWAL
  • Fan Zhang
  • Jishan Hu

Modern Large Language Models, such as the LLaMA, Qwen and DeepSeek series, predominantly adopt the Pre-LayerNorm (Pre-LN) Transformer architecture. While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the learning capacity of deeper layers. To mitigate this issue, we propose Gradient-Preserving Activation Scaling (GPAS), a simple technique that can be used in combination with existing approaches. GPAS works by scaling down the intermediate activations while keeping their gradients unchanged. This leaves information in the activations intact, and avoids the gradient vanishing problem associated with gradient downscaling. Extensive experiments across various model sizes from 71M to 1B show that GPAS achieves consistent performance gains. Beyond enhancing Pre-LN Transformers, GPAS also shows promise in improving alternative architectures such as Sandwich-LN and DeepNorm, demonstrating its versatility and potential for improving training dynamics in a wide range of settings. Our code is available at https: //github. com/dandingsky/GPAS.

AAAI Conference 2025 Conference Paper

HeGTa: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

  • Rihui Jin
  • Yu Li
  • Guilin Qi
  • Nan Hu
  • Yuan-Fang Li
  • Jiaoyan Chen
  • Jianan Wang
  • Yongrui Chen

Table Understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures. To address these challenges, we propose HeGTa, a heterogeneous graph (HG)-enhanced large language model (LLM) designed for few-shot TU tasks. This framework aligns structural table semantics with the LLM's parametric knowledge through soft prompts and instruction tuning. It also addresses complex tables with a multi-task pre-training scheme, incorporating three novel multi-granularity self-supervised HG pre-text tasks. We empirically demonstrate the effectiveness of HeGTa, showing that it outperforms the SOTA for few-shot complex TU on several benchmarks.

JBHI Journal 2025 Journal Article

KGMAEDDI: Knowledge Graph and Molecular-Graph Masked Autoencoder for Drug-Drug Interaction Prediction

  • Yu Li
  • Zhu-Hong You
  • Yuan Yang
  • Cheng-gang Mi

Drug–drug interaction (DDI) prediction is essential for drug development and clinical safety. Early studies mainly relied on large labeled datasets and focused on structural or sequential drug features, often overlooking topological relationships with biomedical entities such as genes, diseases, and pathways. Although recent approaches have leveraged knowledge graphs (KGs), they typically neglect molecular structural information. To address these limitations, we propose KGMAEDDI, a novel framework that integrates molecular structures and semantic knowledge from KGs for DDI prediction. Specifically, KGMAEDDI employs a message-passing neural network to capture intrinsic structural features of drugs and a knowledge-aware attention network to extract semantic-rich representations from KGs. These representations are fused via a reconstruction-driven feature fusion module that combines a masked autoencoder and bi-directional cross-attention. This design enforces mutual reconstruction between modalities, thereby aligning structural and semantic embeddings in a shared latent space. We evaluate KGMAEDDI on the DrugBank dataset under both binary and multi-class settings. Experimental results show that KGMAEDDI consistently outperforms state-of-the-art baselines, validating its effectiveness in modeling complex drug interactions.

ICML Conference 2025 Conference Paper

Learning Cascade Ranking as One Network

  • Yunli Wang
  • Zhen Zhang
  • Zhiqiang Wang
  • Zixuan Yang
  • Yu Li
  • Jian Yang 0003
  • Shiyang Wen
  • Peng Jiang 0002

Cascade Ranking is a prevalent architecture in large-scale top-k selection systems like recommendation and advertising platforms. Traditional training methods focus on single-stage optimization, neglecting interactions between stages. Recent advances have introduced interaction-aware training paradigms, but still struggle to 1) align training objectives with the goal of the entire cascade ranking (i. e. , end-to-end recall of ground-truth items) and 2) learn effective collaboration patterns for different stages. To address these challenges, we propose LCRON, which introduces a novel surrogate loss function derived from the lower bound probability that ground truth items are selected by cascade ranking, ensuring alignment with the overall objective of the system. According to the properties of the derived bound, we further design an auxiliary loss for each stage to drive the reduction of this bound, leading to a more robust and effective top-k selection. LCRON enables end-to-end training of the entire cascade ranking system as a unified network. Experimental results demonstrate that LCRON achieves significant improvement over existing methods on public benchmarks and industrial applications, addressing key limitations in cascade ranking training and significantly enhancing system performance.

NeurIPS Conference 2025 Conference Paper

Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization

  • Chenrui Wang
  • Junyi Shu
  • Billy Chiu
  • Yu Li
  • Saleh Alharbi
  • Min Zhang
  • Jing Li

The rapid development of LLMs has raised concerns about their potential misuse, leading to various watermarking schemes that typically offer high detectability. However, existing watermarking techniques often face trade-off between watermark detectability and generated text quality. In this paper, we introduce Learning to Watermark (LTW), a novel selective watermarking framework that leverages multi-objective optimization to effectively balance these competing goals. LTW features a lightweight network that adaptively decides when to apply the watermark by analyzing sentence embeddings, token entropy, and current watermarking ratio. Training of the network involves two specifically constructed loss functions that guide the model toward Pareto-optimal solutions, thereby harmonizing watermark detectability and text quality. By integrating LTW with two baseline watermarking methods, our experimental evaluations demonstrate that LTW significantly enhances text quality without compromising detectability. Our selective watermarking approach offers a new perspective for designing watermarks for LLMs and a way to preserve high text quality for watermarks. The code is publicly available at: https: //github. com/fattyray/learning-to-watermark

IROS Conference 2025 Conference Paper

Model Predictive Control for Cable-Driven Remote Actuation Systems with Friction and Compliance

  • Moein Forouhar
  • Hamid Sadeghian
  • Yu Li
  • Sami Haddadin

In this work, we model and control a cable-driven, remote-actuated system that includes both friction and compliance in its dynamics. The control objective is to solve a regulation problem using a Model Predictive Controller (MPC). Unlike the flexible-joint robot models, which typically assume frictionless compliant elements, the proposed model incorporates friction forces between two compliant cable-sheaths that connect the motor to the driven link. Three controllers are developed based on the cascade control principles integrated with the MPC framework. Their performance is evaluated through both simulations and experiments on a custom-designed testbed. The results demonstrate that the MPC-Cascade control scheme achieves the best overall performance, with fast convergence and low control effort.

ICML Conference 2025 Conference Paper

Non-Stationary Predictions May Be More Informative: Exploring Pseudo-Labels with a Two-Phase Pattern of Training Dynamics

  • Hongbin Pei
  • Jingxin Hai
  • Yu Li
  • Huiqi Deng
  • Denghao Ma
  • Jie Ma 0001
  • Pinghui Wang
  • Jing Tao

Pseudo-labeling is a widely used strategy in semi-supervised learning. Existing methods typically select predicted labels with high confidence scores and high training stationarity, as pseudo-labels to augment training sets. In contrast, this paper explores the pseudo-labeling potential of predicted labels that do not exhibit these characteristics. We discover a new type of predicted labels suitable for pseudo-labeling, termed two-phase labels, which exhibit a two-phase pattern during training: they are initially predicted as one category in early training stages and switch to another category in subsequent epochs. Case studies show the two-phase labels are informative for decision boundaries. To effectively identify the two-phase labels, we design a 2- phasic metric that mathematically characterizes their spatial and temporal patterns. Furthermore, we propose a loss function tailored for two-phase pseudo-labeling learning, allowing models not only to learn correct correlations but also to eliminate false ones. Extensive experiments on eight datasets show that our proposed 2- phasic metric acts as a powerful booster for existing pseudo-labeling methods by additionally incorporating the two-phase labels, achieving an average classification accuracy gain of 1. 73% on image datasets and 1. 92% on graph datasets.

NeurIPS Conference 2025 Conference Paper

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

  • Honglin Lin
  • Qizhi Pei
  • Zhuoshi Pan
  • Yu Li
  • Xin Gao
  • Juntao Li
  • Conghui He
  • Lijun Wu

Reasoning capability is pivotal for Large Language Models (LLMs) to solve complex tasks, yet achieving reliable and scalable reasoning remains challenging. While Chain-of-Thought (CoT) prompting has become a mainstream approach, existing methods often suffer from uncontrolled generation, insufficient quality, and limited diversity in reasoning paths. Recent efforts leverage code to enhance CoT by grounding reasoning in executable steps, but such methods are typically constrained to predefined mathematical problems, hindering scalability and generalizability. In this work, we propose \texttt{Caco} (Code-Assisted Chain-of-ThOught), a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data through code-driven augmentation. Unlike prior work, \texttt{Caco} first fine-tunes a code-based CoT generator on existing math and programming solutions in a unified code format, then scales the data generation to a large amount of diverse reasoning traces. Crucially, we introduce automated validation via code execution and rule-based filtering to ensure logical correctness and structural diversity, followed by reverse-engineering filtered outputs into natural language instructions and language CoTs to enrich task adaptability. This closed-loop process enables fully automated, scalable synthesis of reasoning data with guaranteed executability. Experiments on our created \texttt{Caco}-1. 3M dataset demonstrate that \texttt{Caco}-trained models achieve strong competitive performance on mathematical reasoning benchmarks, outperforming existing strong baselines. Further analysis reveals that \texttt{Caco}’s code-anchored verification and instruction diversity contribute to superior generalization across unseen tasks. Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.

NeurIPS Conference 2025 Conference Paper

SilentStriker: Toward Stealthy Bit-Flip Attacks on Large Language Models

  • Haotian Xu
  • Qingsong Peng
  • Jie Shi
  • Huadi Zheng
  • Yu Li
  • Cheng Zhuo

The rapid adoption of large language models (LLMs) in critical domains has spurred extensive research into their security issues. While input manipulation attacks (e. g. , prompt injection) have been well-studied, Bit-Flip Attacks (BFAs)—which exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation—have received far less attention. Existing BFA methods suffer from key limitations: they fail to balance performance degradation and output naturalness, making them prone to discovery. In this paper, we introduce SilentStriker, the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness. Our core contribution lies in addressing the challenge of designing effective loss functions for LLMs with variable output length and the vast output space. Unlike prior approaches that rely on output perplexity for attack loss formulation, which in-evidently degrade the output naturalness, we reformulate the attack objective by leveraging key output tokens as targets for suppression, enabling effective joint optimization of attack effectiveness and stealthiness. Additionally, we employ an iterative, progressive search strategy to maximize attack efficacy. Experiments show that SilentStriker significantly outperforms existing baselines, achieving successful attacks without compromising the naturalness of generated text.

AAAI Conference 2025 Conference Paper

TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition

  • Jianhua Zhu
  • Wenqi Zhao
  • Yu Li
  • Xingjian Hu
  • Liangcai Gao

Handwritten Mathematical Expression Recognition (HMER) has extensive applications in automated grading and office automation. However, existing sequence-based decoding methods, which directly predict LaTeX sequences, struggle to understand and model the inherent tree structure of LaTeX and often fail to ensure syntactic correctness in the decoded results. To address these challenges, we propose a novel model named TAMER (Tree-Aware Transformer) for handwritten mathematical expression recognition. TAMER introduces an innovative Tree-aware Module while maintaining the flexibility and efficient training of Transformer. TAMER combines the advantages of both sequence decoding and tree decoding models by jointly optimizing sequence prediction and tree structure prediction tasks, which enhances the model's understanding and generalization of complex mathematical expression structures. During inference, TAMER employs a Tree Structure Prediction Scoring Mechanism to improve the structural validity of the generated LaTeX sequences. Experimental results on CROHME datasets demonstrate that TAMER outperforms traditional sequence decoding and tree decoding models, especially in handling complex mathematical structures, achieving state-of-the-art (SOTA) performance.

JBHI Journal 2025 Journal Article

Task-Aware Effective Connectivity Modeling for Cognitive Function Prediction

  • Wantong Zou
  • Yu Li
  • Xiang Hu
  • Xun Chen
  • Aiping Liu

Effective connectivity (EC) derived from resting-state Functional Magnetic Resonance Imaging (rs fMRI) has emerged as a critical tool for deepening our understanding of brain function in both health and dis ease. However, most studies estimate EC on an individual basis, treating it as a hidden parameter within the model and requiring retraining the model for each subject. They often overlook the valuable population-level information and limit their generalizability. Additionally, EC is typically obtained independently of downstream tasks, reducing its capacity to effectively capture task-specific variations. To address these limitations, we propose a flexible Task-Aware Effective Connectivity (TAEC) model, designed to construct individualized, task-aware, and nonlinear causal brain networks without requiring subject-specific retraining. In this framework, a Causal Discovery Module (CDM) is introduced to capture the implicit neural representation of the EC by a spatial-temporal attention mechanism, producing the estimation of an individual EC. Subsequently, we propose a Task-Aware Graph Neural Network (GNN) Predictor, which incorporates a task-aware penalty to enable end-to-end prediction, enhancing task performance and the identification of task-dependent EC patterns. Extensive experiments on twelve cognitive tasks from the Human Connectome Project (HCP) dataset demonstrate that the proposed method achieves state-of-the-art performance, validating its effectiveness in task-aware effective connectivity modeling. Furthermore, the framework discovers discriminative and task-specific EC patterns, which offer additional in-sights into cognitive functions.

ICLR Conference 2025 Conference Paper

TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction

  • Yun Fei Liu
  • Lei Zhu
  • Lijian Lin
  • Ye Zhu
  • Ailing Zhang
  • Yu Li

3D facial reconstruction from a single in-the-wild image is a crucial task in human-centered computer vision tasks. While existing methods can recover accurate facial shapes, there remains significant space for improvement in fine-grained expression capture. Current approaches struggle with irregular mouth shapes, exaggerated expressions, and asymmetrical facial movements. We present TEASER (Token EnhAnced Spatial modeling for Expressions Reconstruction), which addresses these challenges and enhances 3D facial geometry performance⁠⁠. TEASER tackles two main limitations of existing methods: insufficient photometric loss for self-reconstruction and inaccurate localization of subtle expressions. We introduce a multi-scale tokenizer to extract facial appearance information. Combined with a neural renderer, these tokens provide precise geometric guidance for expression reconstruction. Furthermore, TEASER incorporates a pose-dependent landmark loss to further improve geometric performance⁠. Our approach not only significantly enhances expression reconstruction quality but also offers interpretable tokens suitable for various downstream applications, such as photorealistic facial video driving, expression transfer, and identity swapping. Quantitative and qualitative experimental results across multiple datasets demonstrate that TEASER achieves state-of-the-art performance in precise expression reconstruction.

NeurIPS Conference 2025 Conference Paper

Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

  • Yu Li
  • Jin Jiang
  • Jianhua Zhu
  • Shuai Peng
  • Yuxuan Zhou
  • Liangcai Gao

Handwritten Mathematical Expression Recognition (HMER) remains a persistent challenge in Optical Character Recognition (OCR) due to the inherent freedom of symbol layouts and variability in handwriting styles. Prior methods have faced performance bottlenecks by proposing isolated architectural modifications, making them difficult to integrate coherently into a unified framework. Meanwhile, recent advances in pretrained vision-language models (VLMs) have demonstrated strong cross-task generalization, offering a promising foundation for developing unified solutions. In this paper, we introduce Uni-MuMER, which fully fine-tunes a VLM for the HMER task without modifying its architecture, effectively injecting domain-specific knowledge into a generalist framework. Our method integrates three data-driven tasks: Tree-Aware Chain-of-Thought (Tree-CoT) for structured spatial reasoning, Error-Driven Learning (EDL) for reducing confusion among visually similar characters, and Symbol Counting (SC) for improving recognition consistency in long expressions. Experiments on the CROHME and HME100K datasets show that Uni-MuMER achieves super state-of-the-art performance, outperforming the best lightweight specialized model SSAN by 16. 31\% and the top-performing VLM Gemini2. 5-flash by 24. 42\% under zero-shot setting. Our datasets, models, and code are open-sourced at: https: //github. com/BFlameSwift/Uni-MuMER

AAAI Conference 2024 Conference Paper

Blind Face Restoration under Extreme Conditions: Leveraging 3D-2D Prior Fusion for Superior Structural and Texture Recovery

  • Zhengrui Chen
  • Liying Lu
  • Ziyang Yuan
  • Yiming Zhu
  • Yu Li
  • Chun Yuan
  • Weihong Deng

Blind face restoration under extreme conditions involves reconstructing high-quality face images from severely degraded inputs. These input images are often in poor quality and have extreme facial poses, leading to errors in facial structure and unnatural artifacts within the restored images. In this paper, we show that utilizing 3D priors effectively compensates for structure knowledge deficiencies in 2D priors while preserving the texture details. Based on this, we introduce FREx (Face Restoration under Extreme conditions) that combines structure-accurate 3D priors and texture-rich 2D priors in pretrained generative networks for blind face restoration under extreme conditions. To fuse the different information in 3D and 2D priors, we introduce an adaptive weight module that adjusts the importance of features based on the input image's condition. With this approach, our model can restore structure-accurate and natural-looking faces even when the images have lost a lot of information due to degradation and extreme pose. Extensive experimental results on synthetic and real-world datasets validate the effectiveness of our methods.

NeurIPS Conference 2024 Conference Paper

CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors

  • Linye Lyu
  • Jiawei Zhou
  • Daojing He
  • Yu Li

Prior works on physical adversarial camouflage against vehicle detectors mainly focus on the effectiveness and robustness of the attack. The current most successful methods optimize 3D vehicle texture at a pixel level. However, this results in conspicuous and attention-grabbing patterns in the generated camouflage, which humans can easily identify. To address this issue, we propose a Customizable and Natural Camouflage Attack (CNCA) method by leveraging an off-the-shelf pre-trained diffusion model. By sampling the optimal texture image from the diffusion model with a user-specific text prompt, our method can generate natural and customizable adversarial camouflage while maintaining high attack performance. With extensive experiments on the digital and physical worlds and user studies, the results demonstrate that our proposed method can generate significantly more natural-looking camouflage than the state-of-the-art baselines while achieving competitive attack performance.

NeurIPS Conference 2024 Conference Paper

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

  • Chuanyang Zheng
  • Yihang Gao
  • Han Shi
  • Minbin Huang
  • Jingyao Li
  • Jing Xiong
  • Xiaozhe Ren
  • Michael Ng

Positional encoding plays a crucial role in transformers, significantly impact- ing model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and flexibility. Hence, we expect that the desired positional encoding should be data-adaptive and can be dynamically adjusted with the given attention. In this paper, we propose a Data-Adaptive Positional Encoding (DAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization, where the improvements are statistically significant. The model visualization suggests that our model can keep both local and anti-local information. Finally, we successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods, revealing the benefit of the adaptive positional encoding method.

EAAI Journal 2024 Journal Article

DRNAS: Differentiable RBF neural architecture search method considering computation load in adaptive control

  • Ruichen Ming
  • XiaoXiong Liu
  • Yu Li
  • Wei Huang
  • Weiguo Zhang

In this work, we investigated an online differential neural network search control algorithm using a backstepping method with a radial basis function (RBF) neural network (NN) framework. In this approach, we mainly focused on searching a neural network architecture with optimal control performance and optimal computation load by learning NN parameters among a finite number of RBF NNs with different architectures. The previous works on RBFNN and backstepping methods mainly considered the control performance of systems, and the computation load limitations of control computers were rarely considered. In this paper, we herein propose a differentiable RBF neural architecture search (DRNAS) method. First, we built a hypernetwork and constructed an appropriate optimization objective function with information of a tracking error and a computation load. This hypernetwork consists of different networks with weight parameters. Then, through backpropagation and based on the gradient descent method, we updated the parameters of the hypernetwork and determined the optimal RBF NN architecture in the search space. Finally, we performed simulations to verify the effectiveness of the proposed method, where we designed an RBF NN adaptive backstepping controller for aircraft pitch rate dynamics and used the DRNAS method to train the hypernetwork based on different mission scenarios. The simulation results verified that the proposed method can effectively balance the controller’s tracking capability with its computation load.

ICLR Conference 2024 Conference Paper

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

  • Xuangeng Chu
  • Yu Li
  • Ailing Zeng
  • Tianyu Yang
  • Lijian Lin
  • Yun Fei Liu
  • Tatsuya Harada

Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based warping, mesh-based, and neural rendering approaches, present challenges in maintaining multi-view consistency, incorporating non-facial information, and generalizing to new identities. In this paper, we propose a framework named GPAvatar that reconstructs 3D head avatars from one or several images in a single forward pass. The key idea of this work is to introduce a dynamic point-based expression field driven by a point cloud to precisely and effectively capture expressions. Furthermore, we use a Multi Tri-planes Attention (MTA) fusion module in tri-planes canonical field to leverage information from multiple input images. The proposed method achieves faithful identity reconstruction, precise expression control, and multi-view consistency, demonstrating promising results for free-viewpoint rendering and novel view synthesis.

NeurIPS Conference 2024 Conference Paper

HORSE: Hierarchical Representation for Large-Scale Neural Subset Selection

  • Binghui Xie
  • Yixuan Wang
  • Yongqiang Chen
  • Kaiwen Zhou
  • Yu Li
  • Wei Meng
  • James Cheng

Subset selection tasks, such as anomaly detection and compound selection in AI-assisted drug discovery, are crucial for a wide range of applications. Learning subset-valued functions with neural networks has achieved great success by incorporating permutation invariance symmetry into the architecture. However, existing neural set architectures often struggle to either capture comprehensive information from the superset or address complex interactions within the input. Additionally, they often fail to perform in scenarios where superset sizes surpass available memory capacity. To address these challenges, we introduce the novel concept of the Identity Property, which requires models to integrate information from the originating set, resulting in the development of neural networks that excel at performing effective subset selection from large supersets. Moreover, we present the Hierarchical Representation of Neural Subset Selection (HORSE), an attention-based method that learns complex interactions and retains information from both the input set and the optimal subset supervision signal. Specifically, HORSE enables the partitioning of the input ground set into manageable chunks that can be processed independently and then aggregated, ensuring consistent outcomes across different partitions. Through extensive experimentation, we demonstrate that HORSE significantly enhances neural subset selection performance by capturing more complex information and surpasses state-of-the-art methods in handling large-scale inputs by a margin of up to 20%.

TMLR Journal 2024 Journal Article

Lyra: Orchestrating Dual Correction in Automated Theorem Proving

  • Chuanyang Zheng
  • Haiming Wang
  • Enze Xie
  • Zhengying Liu
  • Jiankai Sun
  • Huajian Xin
  • Jianhao Shen
  • Zhenguo Li

Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framework that employs two distinct correction mechanisms: Tool Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in the post-processing of formal proofs, we leverage prior knowledge to utilize predefined prover tools (e.g., Sledgehammer) for guiding the replacement of incorrect tools. Tool Correction significantly contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages. Compared to the previous refinement framework, the proposed Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts. Our method has achieved state-of-the-art (SOTA) performance on both miniF2F validation (48.0% → 55.3%) and test (45.5% → 51.2%). We also present 3 IMO problems solved by Lyra. We believe Tool Correction (post-process for hallucination mitigation) and Conjecture Correction (subgoal adjustment from interaction with the environment) could provide a promising avenue for future research in this field.

NeurIPS Conference 2024 Conference Paper

MSA Generation with Seqs2Seqs Pretraining: Advancing Protein Structure Predictions

  • Le Zhang
  • Jiayang Chen
  • Tao Shen
  • Yu Li
  • Siqi Sun

Deep learning models like AlphaFold2 have revolutionized protein structure prediction, achieving unprecedented accuracy. However, the dependence on robust multiple sequence alignments (MSAs) continues to pose a challenge, especially for proteins that lack a wealth of homologous sequences. To overcome this limitation, we introduce MSA-Generator, a self-supervised generative protein language model. Trained on a sequence-to-sequence task using an automatically constructed dataset, MSA-Generator employs protein-specific attention mechanisms to harness large-scale protein databases, generating virtual MSAs that enrich existing ones and boost prediction accuracy. Our experiments on CASP14 and CASP15 benchmarks reveal significant improvements in LDDT scores, particularly for complex and challenging sequences, enhancing the performance of both AlphaFold2 and RoseTTAFold. The code is released at \url{https: //github. com/lezhang7/MSAGen}.

ICML Conference 2024 Conference Paper

Multi-Track Message Passing: Tackling Oversmoothing and Oversquashing in Graph Learning via Preventing Heterophily Mixing

  • Hongbin Pei
  • Yu Li
  • Huiqi Deng
  • Jingxin Hai
  • Pinghui Wang
  • Jie Ma 0001
  • Jing Tao
  • Yuheng Xiong

The advancement toward deeper graph neural networks is currently obscured by two inherent issues in message passing, oversmoothing and oversquashing. We identify the root cause of these issues as information loss due to heterophily mixing in aggregation, where messages of diverse category semantics are mixed. We propose a novel multi-track graph convolutional network to address oversmoothing and oversquashing effectively. Our basic idea is intuitive: if messages are separated and independently propagated according to their category semantics, heterophilic mixing can be prevented. Consequently, we present a novel multi-track message passing scheme capable of preventing heterophilic mixing, enhancing long-distance information flow, and improving separation condition. Empirical validations show that our model achieved state-of-the-art performance on several graph datasets and effectively tackled oversmoothing and oversquashing, setting a new benchmark of $86. 4$% accuracy on Cora.

IJCAI Conference 2024 Conference Paper

Protecting Object Detection Models from Model Extraction Attack via Feature Space Coverage

  • Zeyu Li
  • Yuwen Pu
  • Xuhong Zhang
  • Yu Li
  • Jinbao Li
  • Shouling Ji

The model extraction attack is an attack pattern aimed at stealing well-trained machine learning models' functionality or privacy information. With the gradual popularization of AI-related technologies in daily life, various well-trained models are being deployed. As a result, these models are considered valuable assets and attractive to model extraction attackers. Currently, the academic community primarily focuses on defense for model extraction attacks in the context of classification, with little attention to the more commonly used task scenario of object detection. Therefore, we propose a detection framework targeting model extraction attacks against object detection models in this paper. The framework first locates suspicious users based on feature coverage in query traffic and uses an active verification module to confirm whether the identified suspicious users are attackers. Through experiments conducted in multiple task scenarios, we validate the effectiveness and detection efficiency of the proposed method.

NeurIPS Conference 2024 Conference Paper

SubgDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning

  • Jiying Zhang
  • Zijing Liu
  • Yu Wang
  • Bin Feng
  • Yu Li

Molecular representation learning has shown great success in advancing AI-based drug discovery. A key insight of many recent works is that the 3D geometric structure of molecules provides essential information about their physicochemical properties. Recently, denoising diffusion probabilistic models have achieved impressive performance in molecular 3D conformation generation. However, most existing molecular diffusion models treat each atom as an independent entity, overlooking the dependency among atoms within the substructures. This paper introduces a novel approach that enhances molecular representation learning by incorporating substructural information in the diffusion model framework. We propose a novel diffusion model termed SubgDiff for involving the molecular subgraph information in diffusion. Specifically, SubgDiff adopts three vital techniques: i) subgraph prediction, ii) expectation state, and iii) k-step same subgraph diffusion, to enhance the perception of molecular substructure in the denoising network. Experiments on extensive downstream tasks, especially the molecular force predictions, demonstrate the superior performance of our approach.

ICRA Conference 2024 Conference Paper

Torque Transmission in Double-Tendon Sheath Driven Actuators for Application in Exoskeletons

  • Daniel Pérez-Suay
  • Yu Li
  • Hamid Sadeghian
  • Abdeldjallil Naceri
  • Sami Haddadin

Bowden cables serve as essential components in various mechanical systems, facilitating power transmission from remote actuators to specific destinations. The pretension of Bowden cables profoundly influences system performance, notably in terms of friction. This study investigates the effects of cable pretension and shape on friction and torque efficiency. A custom self-designed testbed, comprising integrated actuator units, pulleys, and a novel pretension mechanism connected by Bowden cables, is utilized to conduct experimental tests under varying parameters. This work adopts an integrated approach of experimentation, modeling, and validation, offering preliminary insights into the torque transmission characteristics of tendon driven actuator systems. Additionally, the precise model exhibits excellent conformity across a broad range of shapes and provides initial insights into hysteresis modeling attributable to cable material properties.

NeurIPS Conference 2024 Conference Paper

Towards Stable Representations for Protein Interface Prediction

  • Ziqi Gao
  • Zijing Liu
  • Yu Li
  • Jia Li

The knowledge of protein interactions is crucial but challenging for drug discovery applications. This work focuses on protein interface prediction, which aims to determine whether a pair of residues from different proteins interact. Existing data-driven methods have made significant progress in effectively learning protein structures. Nevertheless, they overlook the conformational changes (i. e. , flexibility) within proteins upon binding, leading to poor generalization ability. In this paper, we regard the protein flexibility as an attack on the trained model and aim to defend against it for improved generalization. To fulfill this purpose, we propose ATProt, an adversarial training framework for protein representations to robustly defend against the attack of protein flexibility. ATProt can theoretically guarantee protein representation stability under complicated protein flexibility. Experiments on various benchmarks demonstrate that ATProt consistently improves the performance for protein interface prediction. Moreover, our method demonstrates broad applicability, performing the best even when provided with testing structures from structure prediction models like ESMFold and AlphaFold2.

NeurIPS Conference 2024 Conference Paper

Vector Quantization Prompting for Continual Learning

  • Li Jiao
  • Qiuxia Lai
  • Yu Li
  • Qiang Xu

Continual learning requires to overcome catastrophic forgetting when training a single model on a sequence of tasks. Recent top-performing approaches are prompt-based methods that utilize a set of learnable parameters (i. e. , prompts) to encode task knowledge, from which appropriate ones are selected to guide the fixed pre-trained model in generating features tailored to a certain task. However, existing methods rely on predicting prompt identities for prompt selection, where the identity prediction process cannot be optimized with task loss. This limitation leads to sub-optimal prompt selection and inadequate adaptation of pre-trained features for a specific task. Previous efforts have tried to address this by directly generating prompts from input queries instead of selecting from a set of candidates. However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. In this way, VQ-Prompt can optimize the prompt selection process with task loss and meanwhile achieve effective abstraction of task knowledge for continual learning. Extensive experiments show that VQ-Prompt outperforms state-of-the-art continual learning methods across a variety of benchmarks under the challenging class-incremental setting.

EAAI Journal 2024 Journal Article

“Will artificial intelligence platforms replace designers in the future?” analyzing the impact of artificial intelligence platforms on the engineering design industry through color perception

  • Yu Li

This research investigates the impact of artificial intelligence platforms on the engineering design industry by analyzing the perceptions of color and views on artificial intelligence development among designers and non-designers. The study utilized a two-stage approach: the first involved artificial intelligence-assisted design tasks and focus group discussions with 24 participants, capturing both visual and textual data. K-means color clustering and thematic analysis were applied to images and texts. The second stage involved surveying 222 individuals and conducting a chi-square test to analyze the frequency of different color dimensions used by the two groups. The results demonstrate that differences in color perception influenced design decision-making outcomes. There was widespread acknowledgment across all groups of the efficiency and inspirational potential of artificial intelligence platforms. However, variations in design education resulted in differing opinions on the replaceability of designers. Future trajectories for artificial intelligence platforms are likely to focus on specialization while addressing challenges such as echo chamber effects, copyright disputes, and the prevention of private information leakage.

EAAI Journal 2023 Journal Article

A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection

  • Yu Li
  • Anisha Parsan
  • Bill Wang
  • Penghao Dong
  • Shanshan Yao
  • Ruwen Qin

Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95. 3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99. 2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93. 9% in verifying authorized inspectors and 76. 1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human–robot interaction Including worker heterogeneity Worker dynamics And job heterogeneity.

NeurIPS Conference 2023 Conference Paper

HiBug: On Human-Interpretable Model Debug

  • Muxi Chen
  • Yu Li
  • Qiang Xu

Machine learning models can frequently produce systematic errors on critical subsets (or slices) of data that share common attributes. Discovering and explaining such model bugs is crucial for reliable model deployment. However, existing bug discovery and interpretation methods usually involve heavy human intervention and annotation, which can be cumbersome and have low bug coverage. In this paper, we propose HiBug, an automated framework for interpretable model debugging. Our approach utilizes large pre-trained models, such as chatGPT, to suggest human-understandable attributes that are related to the targeted computer vision tasks. By leveraging pre-trained vision-language models, we can efficiently identify common visual attributes of underperforming data slices using human-understandable terms. This enables us to uncover rare cases in the training data, identify spurious correlations in the model, and use the interpretable debug results to select or generate new training data for model improvement. Experimental results demonstrate the efficacy of the HiBug framework.

NeurIPS Conference 2023 Conference Paper

L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors

  • Zheng Chang
  • Shuchen Weng
  • Peixuan Zhang
  • Yu Li
  • Si Li
  • Boxin Shi

Language-based colorization produces plausible and visually pleasing colors under the guidance of user-friendly natural language descriptions. Previous methods implicitly assume that users provide comprehensive color descriptions for most of the objects in the image, which leads to suboptimal performance. In this paper, we propose a unified model to perform language-based colorization with any-level descriptions. We leverage the pretrained cross-modality generative model for its robust language understanding and rich color priors to handle the inherent ambiguity of any-level descriptions. We further design modules to align with input conditions to preserve local spatial structures and prevent the ghosting effect. With the proposed novel sampling strategy, our model achieves instance-aware colorization in diverse and complex scenarios. Extensive experimental results demonstrate our advantages of effectively handling any-level descriptions and outperforming both language-based and automatic colorization methods. The code and pretrained modelsare available at: https: //github. com/changzheng123/L-CAD.

AAAI Conference 2023 Conference Paper

Learning Single Image Defocus Deblurring with Misaligned Training Pairs

  • Yu Li
  • Dongwei Ren
  • Xinya Shu
  • Wangmeng Zuo

By adopting popular pixel-wise loss, existing methods for defocus deblurring heavily rely on well aligned training image pairs. Although training pairs of ground-truth and blurry images are carefully collected, e.g., DPDD dataset, misalignment is inevitable between training pairs, making existing methods possibly suffer from deformation artifacts. In this paper, we propose a joint deblurring and reblurring learning (JDRL) framework for single image defocus deblurring with misaligned training pairs. Generally, JDRL consists of a deblurring module and a spatially invariant reblurring module, by which deblurred result can be adaptively supervised by ground-truth image to recover sharp textures while maintaining spatial consistency with the blurry image. First, in the deblurring module, a bi-directional optical flow-based deformation is introduced to tolerate spatial misalignment between deblurred and ground-truth images. Second, in the reblurring module, deblurred result is reblurred to be spatially aligned with blurry image, by predicting a set of isotropic blur kernels and weighting maps. Moreover, we establish a new single image defocus deblurring (SDD) dataset, further validating our JDRL and also benefiting future research. Our JDRL can be applied to boost defocus deblurring networks in terms of both quantitative metrics and visual quality on DPDD, RealDOF and our SDD datasets.

AAAI Conference 2023 Conference Paper

Signed Laplacian Graph Neural Networks

  • Yu Li
  • Meng Qu
  • Jian Tang
  • Yi Chang

This paper studies learning meaningful node representations for signed graphs, where both positive and negative links exist. This problem has been widely studied by meticulously designing expressive signed graph neural networks, as well as capturing the structural information of the signed graph through traditional structure decomposition methods, e.g., spectral graph theory. In this paper, we propose a novel signed graph representation learning framework, called Signed Laplacian Graph Neural Network (SLGNN), which combines the advantages of both. Specifically, based on spectral graph theory and graph signal processing, we first design different low-pass and high-pass graph convolution filters to extract low-frequency and high-frequency information on positive and negative links, respectively, and then combine them into a unified message passing framework. To effectively model signed graphs, we further propose a self-gating mechanism to estimate the impacts of low-frequency and high-frequency information during message passing. We mathematically establish the relationship between the aggregation process in SLGNN and signed Laplacian regularization in signed graphs, and theoretically analyze the expressiveness of SLGNN. Experimental results demonstrate that SLGNN outperforms various competitive baselines and achieves state-of-the-art performance.

JBHI Journal 2022 Journal Article

A Joint Constrained CCA Model for Network-Dependent Brain Subregion Parcellation

  • Qinrui Ling
  • Aiping Liu
  • Yu Li
  • Xueyang Fu
  • Xun Chen
  • Martin J. McKeown
  • Feng Wu

Connectivity-based brain region parcellation from functional magnetic resonance imaging (fMRI) data is complicated by heterogeneity among aged and diseased subjects, particularly when the data are spatially transformed to a common space. Here, we propose a group-guided functional brain region parcellation model capable of obtaining subregions from a target region with consistent connectivity profiles across multiple subjects, even when the fMRI signals are kept in their native spaces. The model is based on a joint constrained canonical correlation analysis (JC-CCA) method that achieves group-guided parcellation while allowing the data dimension of the parcellated regions for each subject to vary. We performed extensive experiments on synthetic and real data to demonstrate the superiority of the proposed model compared to other classical methods. When applied to fMRI data of subjects with and without Parkinson's disease (PD) to estimate the subregions in the Putamen, significant between-group differences were found in the derived subregions and the connectivity patterns. Superior classification and regression results were obtained, demonstrating its potential in clinical practice.

AAAI Conference 2022 Conference Paper

Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation

  • Qin Wang
  • Jiayang Chen
  • Yuzhe Zhou
  • Yu Li
  • Liangzhen Zheng
  • Sheng Wang
  • Zhen Li
  • Shuguang Cui

Accurate protein contact map prediction (PCMP) is essential for precise protein structure estimation and further biological studies. Recent works achieve significant performance on this task with high quality multiple sequence alignment (MSA). However, the PCMP accuracy drops dramatically while only poor MSA (e. g. , absolute MSA count less than 10) is available. Therefore, in this paper, we propose the Contact-Distil to improve the low homologous PCMP accuracy through knowledge distillation on a self-supervised model. Particularly, two pre-trained transformers are exploited to learn the high quality and low quality MSA representation in parallel for the teacher and student model correspondingly. Besides, the co-evolution information is further extracted from pure sequence through a pretrained ESM-1b model, which provides auxiliary knowledge to improve student performance. Extensive experiments show Contact-Distil outperforms previous state-of-the-arts by large margins on CAMEO-L dataset for low homologous PCMP, i. e. , around 13. 3% and 9. 5% improvements against Alphafold2 and MSA Transformer respectively when MSA count less than 10.

YNIMG Journal 2022 Journal Article

No smoking signs with strong smoking symbols induce weak cravings: an fMRI and EEG study

  • Wanwan Lü
  • Qichao Wu
  • Ying Liu
  • Ying Wang
  • Zhengde Wei
  • Yu Li
  • Chuan Fan
  • An-Li Wang

No smoking signs (NSSs) that combine smoking symbols (SSs) and prohibition symbols (PSs) represent common examples of reward and prohibition competition. To evaluate how SSs within NSSs influence their effectiveness in guiding reward vs. prohibition, we studied 93 male smokers. We collected self-reported craving ratings (N=30), cue reactivity under fMRI/EEG (N=33), and smoking-behavior anticipation for paired NSSs and SSs (N=30). We found that NSS-induced cravings were negatively correlated with SS-induced cravings and PS-induced inhibition. fMRI indicated that both correlations were mediated by activation of the inferior frontal gyrus and precuneus, suggesting that the effects of SSs and PSs interact with each other. EEG revealed that the prohibition response occurs after the cigarette response, indicating that the cigarette response might be precluded by the prohibition, supporting the effect of SSs in discouraging smoking. Moreover, stronger SSs induced stronger slow positive waves and late positive potentials, and the stronger the late positive potentials, the stronger the late positive potentials. Both the amplitudes of late positive potentials and slow positive waves were positively correlated with the amplitude of N2, which was positively correlated with the attention grabbed score by the NSS. In addition, the weaker the NSS-induced craving, the greater the smoking behavior anticipation reduction, indicating the capability of NSSs to decrease smoking behavior. Our study provides empirical evidence for selecting the most effective NSSs: those combining strong SS and PS, offering insights about competition between cigarette reward and prohibition and providing neural evidence on how cigarette reward and prohibition interact.

AAAI Conference 2022 Conference Paper

Orthogonal Graph Neural Networks

  • Kai Guo
  • Kaixiong Zhou
  • Xia Hu
  • Yu Li
  • Yi Chang
  • Xin Wang

Graph neural networks (GNNs) have received tremendous attention due to their superiority in learning node representations. These models rely on message passing and feature transformation functions to encode the structural and feature information from neighbors. However, stacking more convolutional layers significantly decreases the performance of GNNs. Most recent studies attribute this limitation to the over-smoothing issue, where node embeddings converge to indistinguishable vectors. Through a number of experimental observations, we argue that the main factor degrading the performance is the unstable forward normalization and backward gradient resulted from the improper design of the feature transformation, especially for shallow GNNs where the over-smoothing has not happened. Therefore, we propose a novel orthogonal feature transformation, named Ortho- GConv, which could generally augment the existing GNN backbones to stabilize the model training and improve the model’s generalization performance. Specifically, we maintain the orthogonality of the feature transformation comprehensively from three perspectives, namely hybrid weight initialization, orthogonal transformation, and orthogonal regularization. By equipping the existing GNNs (e. g. GCN, JKNet, GCNII) with Ortho-GConv, we demonstrate the generality of the orthogonal feature transformation to enable stable training, and show its effectiveness for node and graph classification tasks.

AAAI Conference 2022 Conference Paper

SMINet: State-Aware Multi-Aspect Interests Representation Network for Cold-Start Users Recommendation

  • Wanjie Tao
  • Yu Li
  • Liangyue Li
  • Zulong Chen
  • Hong Wen
  • Peilin Chen
  • Tingting Liang
  • Quan Lu

Online travel platforms (OTPs), e. g. , bookings. com and Ctrip. com, deliver travel experiences to online users by providing travel-related products. Although much progress has been made, the state-of-the-arts for cold-start problems are largely sub-optimal for user representation, since they do not take into account the unique characteristics exhibited from user travel behaviors. In this work, we propose a State-aware Multi-aspect Interests representation Network (SMINet) for cold-start users recommendation at OTPs, which consists of a multi-aspect interests extractor, a co-attention layer, and a state-aware gating layer. The key component of the model is the multi-aspect interests extractor, which is able to extract representations for the user’s multi-aspect interests. Furthermore, to learn the interactions between the user behaviors in the current session and the above multi-aspect interests, we carefully design a co-attention layer which allows the cross attentions between the two modules. Additionally, we propose a travel state-aware gating layer to attentively select the multi-aspect interests. The final user representation is obtained by fusing the three components. Comprehensive experiments conducted both offline and online demonstrate the superior performance of the proposed model at user representation, especially for cold-start users, compared with state-ofthe-art methods.

IJCAI Conference 2022 Conference Paper

“Think Before You Speak”: Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs

  • Shuo Zhang
  • Junzhou Zhao
  • Pinghui Wang
  • Yu Li
  • Yi Huang
  • Junlan Feng

Multi-action dialog policy (MADP), which generates multiple atomic dialog actions per turn, has been widely applied in task-oriented dialog systems to provide expressive and efficient system responses. Existing MADP models usually imitate action combinations from the labeled multi-action dialog samples. Due to data limitations, they generalize poorly toward unseen dialog flows. While interactive learning and reinforcement learning algorithms can be applied to incorporate external data sources of real users and user simulators, they take significant manual effort to build and suffer from instability. To address these issues, we propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics to enhance multi-action prediction. Our PEDP method employs model-based planning for conceiving what to express before deciding the current response through simulating single-action dialogs. Experimental results on the MultiWOZ dataset demonstrate that our fully supervised learning-based method achieves a solid task success rate of 90. 6%, improving 3% compared to the state-of-the-art methods. The source code and the appendix of this paper can be obtained from https: //github. com/ShuoZhangXJTU/PEDP.

ICLR Conference 2021 Conference Paper

Beyond Categorical Label Representations for Image Classification

  • Boyuan Chen 0001
  • Yu Li
  • Sunand Raghupathi
  • Hod Lipson

We find that the way we choose to represent data labels can have a profound effect on the quality of trained models. For example, training an image classifier to regress audio labels rather than traditional categorical probabilities produces a more reliable classification. This result is surprising, considering that audio labels are more complex than simpler numerical probabilities or text. We hypothesize that high dimensional, high entropy label representations are generally more useful because they provide a stronger error signal. We support this hypothesis with evidence from various label representations including constant matrices, spectrograms, shuffled spectrograms, Gaussian mixtures, and uniform random matrices of various dimensionalities. Our experiments reveal that high dimensional, high entropy labels achieve comparable accuracy to text (categorical) labels on standard image classification tasks, but features learned through our label representations exhibit more robustness under various adversarial attacks and better effectiveness with a limited amount of training data. These results suggest that label representation may play a more important role than previously thought.

YNIMG Journal 2021 Journal Article

Connectome-based evidence for creative thinking as an emergent property of ordinary cognitive operations

  • Kaixiang Zhuang
  • Wenjing Yang
  • Yu Li
  • Jie Zhang
  • Qunlin Chen
  • Jie Meng
  • Dongtao Wei
  • Jiangzhou Sun

Creative thinking is a hallmark of human cognition, which enables us to generate novel and useful ideas. Nevertheless, its emergence within the macro-scale neurocognitive circuitry remains largely unknown. Using resting-state fMRI data from two large population samples (SWU: n = 931; HCP: n = 1001) and a novel "travelling pattern prediction analysis", here we identified the modularized functional connectivity patterns linked to creative thinking ability, which concurrently explained individual variability across ordinary cognitive abilities such as episodic memory, working memory and relational processing. Further interrogation of this neural pattern with graph theoretical tools revealed both hub-like brain structures and globally-efficient information transfer paths that together may facilitate higher creative thinking ability through the convergence of distinct cognitive operations. Collectively, our results provide reliable evidence for the hypothesized emergence of creative thinking from core cognitive components through neural integration, and thus allude to a significant theoretical advancement in the study of creativity.

JBHI Journal 2021 Journal Article

Hand Gesture Recognition based on Surface Electromyography using Convolutional Neural Network with Transfer Learning Method

  • Xiang Chen
  • Yu Li
  • Ruochen Hu
  • Xu Zhang
  • Xun Chen

This paper presents an effective transfer learning (TL) strategy for the realization of surface electromyography (sEMG)-based gesture recognition with high generalization and low training burden. To realize the idea of taking a well-trained model as the feature extractor of the target networks, 30 hand gestures involving various states of finger joints, elbow joint and wrist joint are selected to compose the source task, and a convolutional neural network (CNN)-based source network is designed and trained as the general gesture EMG feature extraction network. Then, two types of target networks, in the forms of CNN-only and CNN+LSTM (long short-term memory) respectively, are designed with the same CNN architecture as the feature extraction network. Finally, gesture recognition experiments on three different target gesture datasets are carried out under TL and Non-TL strategies respectively. The experimental results verify the validity of the proposed TL strategy in improving hand gesture recognition accuracy and reducing training burden. For both the CNN-only and the CNN+LSTM target networks, on the three target datasets from new users, new gestures and different collection scheme, the proposed TL strategy improves the recognition accuracy by 10%~38%, reduces the training time to tens of times, and guarantees the recognition accuracy of more than 90% when only 2 repetitions of each gesture are used to fine-tune the parameters of target networks. The proposed TL strategy has important application value for promoting the development of myoelectric control systems.

IJCAI Conference 2021 Conference Paper

Information Bottleneck Approach to Spatial Attention Learning

  • Qiuxia Lai
  • Yu Li
  • Ailing Zeng
  • Minhao Liu
  • Hanqiu Sun
  • Qiang Xu

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity. This kind of selectivity acts as an ‘Information Bottleneck (IB)’, which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e. g. , image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at this https URL.

AAAI Conference 2021 Conference Paper

RevMan: Revenue-aware Multi-task Online Insurance Recommendation

  • Yu Li
  • Yi Zhang
  • Lu Gan
  • Gengwei Hong
  • Zimu Zhou
  • Qiang Li

Online insurance is a new type of e-commerce with exponential growth. An effective recommendation model that maximizes the total revenue of insurance products listed in multiple customized sales scenarios is crucial for the success of online insurance business. Prior recommendation models are ineffective because they fail to characterize the complex relatedness of insurance products in multiple sales scenarios and maximize the overall conversion rate rather than the total revenue. Even worse, it is impractical to collect training data online for total revenue maximization due to the business logic of online insurance. We propose RevMan, a Revenueaware Multi-task Network for online insurance recommendation. RevMan adopts an adaptive attention mechanism to allow effective feature sharing among complex insurance products and sales scenarios. It also designs an efficient offline learning mechanism to learn the rank that maximizes the expected total revenue, by reusing training data and model for conversion rate maximization. Extensive offline and online evaluations show that RevMan outperforms the state-of-theart recommendation systems for e-commerce.

JBHI Journal 2021 Journal Article

Striatal Subdivisions Estimated via Deep Embedded Clustering With Application to Parkinson's Disease

  • Yu Li
  • Aiping Liu
  • Taomian Mi
  • Runyu Yang
  • Piu Chan
  • Martin J. McKeown
  • Xun Chen
  • Feng Wu

Recent fMRI connectivity-based parcellation (CBP) methods have been developed to obtain homogeneous and functionally coherent brain parcels. However, most of these studies utilize traditional clustering methods that neglect hidden nonlinear features. To enhance parcellation performance, here we propose a deep embedded connectivity-based parcellation (DECBP) framework and apply it to determine functional subdivisions of the striatum in public resting state fMRI data sets. This framework integrates fMRI connectivity features into deep embedded clustering (DEC), a deep neural network based on a stacked autoencoder. Compared to three prevalent clustering methods and their combinations with principal component analysis (PCA), the DECBP exhibited a significantly higher similarity between scans, individuals, and groups, indicating enhanced reproducibility. The generated reliable parcellations were also largely consistent with other public atlases. We further explored the functional subunits in the striatum in a data set from 23 Parkinson's disease (PD) subjects and 27 age-matched healthy controls (HC). All putaminal subregions of PD demonstrated lower interhemispheric connectivity than those of HC, which might reflect imbalance in the pathological progression of PD. Such hypo-connectivity was also observed between putaminal subregions and other brain regions, reflecting neuroimaging manifestations of the altered cortico-striato-thalamo-cortical circuit. These observed weaker couplings were associated with PD severity and duration. Our results support the utilization of the DECBP framework and suggest that abnormal connectivity in putaminal subregions may be a potential indicator of PD.

NeurIPS Conference 2021 Conference Paper

TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

  • Yu Li
  • Min Li
  • Qiuxia Lai
  • Yannan Liu
  • Qiang Xu

Deep learning (DL) systems are notoriously difficult to test and debug due to the lack of correctness proof and the huge test input space to cover. Given the ubiquitous unlabeled test data and high labeling cost, in this paper, we propose a novel test prioritization technique, namely TestRank, which aims at revealing more model failures with less labeling effort. TestRank brings order into the unlabeled test data according to their likelihood of being a failure, i. e. , their failure-revealing capabilities. Different from existing solutions, TestRank leverages both intrinsic and contextual attributes of the unlabeled test data when prioritizing them. To be specific, we first build a similarity graph on both unlabeled test samples and labeled samples (e. g. , training or previously labeled test samples). Then, we conduct graph-based semi-supervised learning to extract contextual features from the correctness of similar labeled samples. For a particular test instance, the contextual features extracted with the graph neural network and the intrinsic features obtained with the DL model itself are combined to predict its failure-revealing capability. Finally, TestRank prioritizes unlabeled test inputs in descending order of the above probability value. We evaluate TestRank on three popular image classification datasets, and results show that TestRank significantly outperforms existing test prioritization techniques.

AIJ Journal 2020 Journal Article

Clause vivification by unit propagation in CDCL SAT solvers

  • Chu-Min Li
  • Fan Xiao
  • Mao Luo
  • Felip Manyà
  • Zhipeng Lü
  • Yu Li

Original and learnt clauses in Conflict-Driven Clause Learning (CDCL) SAT solvers often contain redundant literals. This may have a negative impact on solver performance, because redundant literals may deteriorate both the effectiveness of Boolean constraint propagation and the quality of subsequent learnt clauses. To overcome this drawback, we propose a clause vivification approach that eliminates redundant literals by applying unit propagation. The proposed clause vivification is activated before the SAT solver triggers some selected restarts, and only affects a subset of original and learnt clauses, which are considered to be more relevant according to metrics like the literal block distance (LBD). Moreover, we conducted an empirical investigation with instances coming from the hard combinatorial and application categories of recent SAT competitions. The results show that a significant number of additional instances are solved when the proposed approach is incorporated into five of the best performing CDCL SAT solvers (Glucose, TC_Glucose, COMiniSatPS, MapleCOMSPS and MapleCOMSPS_LRB). More importantly, the empirical investigation includes an in-depth analysis of the effectiveness of clause vivification. It is worth mentioning that one of the SAT solvers described here was ranked first in the main track of SAT Competition 2017 thanks to the incorporation of the proposed clause vivification. That solver was further improved in this paper and won the bronze medal in the main track of SAT Competition 2018.

AAAI Conference 2020 Conference Paper

End-to-End Trainable Non-Collaborative Dialog System

  • Yu Li
  • Kun Qian
  • Weiyan Shi
  • Zhou Yu

End-to-end task-oriented dialog models have achieved promising performance on collaborative tasks where users willingly coordinate with the system to complete a given task. While in non-collaborative settings, for example, negotiation and persuasion, users and systems do not share a common goal. As a result, compared to collaborate tasks, people use social content to build rapport and trust in these non-collaborative settings in order to advance their goals. To handle social content, we introduce a hierarchical intent annotation scheme, which can be generalized to different non-collaborative dialog tasks. Building upon Transfer- Transfo (Wolf et al. 2019), we propose an end-to-end neural network model to generate diverse coherent responses. Our model utilizes intent and semantic slots as the intermediate sentence representation to guide the generation process. In addition, we design a filter to select appropriate responses based on whether these intermediate representations fit the designed task and conversation constraints. Our noncollaborative dialog model guides users to complete the task while simultaneously keeps them engaged. We test our approach on our newly proposed ANTISCAM dataset and an existing PERSUASIONFORGOOD dataset. Both automatic and human evaluations suggest that our model outperforms multiple baselines in these two non-collaborative tasks.

AAAI Conference 2020 Conference Paper

Learning Signed Network Embedding via Graph Attention

  • Yu Li
  • Yuan Tian
  • Jiawei Zhang
  • Yi Chang

Learning the low-dimensional representations of graphs (i. e. , network embedding) plays a critical role in network analysis and facilitates many downstream tasks. Recently graph convolutional networks (GCNs) have revolutionized the field of network embedding, and led to state-of-the-art performance in network analysis tasks such as link prediction and node classification. Nevertheless, most of the existing GCN-based network embedding methods are proposed for unsigned networks. However, in the real world, some of the networks are signed, where the links are annotated with different polarities, e. g. , positive vs. negative. Since negative links may have different properties from the positive ones and can also significantly affect the quality of network embedding. Thus in this paper, we propose a novel network embedding framework SNEA to learn Signed Network Embedding via graph Attention. In particular, we propose a masked self-attentional layer, which leverages self-attention mechanism to estimate the importance coefficient for pair of nodes connected by different type of links during the embedding aggregation process. Then SNEA utilizes the masked self-attentional layers to aggregate more important information from neighboring nodes to generate the node embeddings based on balance theory. Experimental results demonstrate the effectiveness of the proposed framework through signed link prediction task on several real-world signed network datasets.

AIIM Journal 2020 Journal Article

Temporal tree representation for similarity computation between medical patients

  • Suresh Pokharel
  • Guido Zuccon
  • Xue Li
  • Chandra Prasetyo Utomo
  • Yu Li

Objective The aim of this study is to compute similarities between patient records in an electronic health record (EHR). This is an important problem because the availability of effective methods for the computation of patient similarity would allow for assistance with and automation of tasks such as patients stratification, medical prognosis and cohort selection, and for unlocking the potential of medical analytics methods for healthcare intelligence. However, health data in EHRs presents many challenges that make the automatic computation of patient similarity difficult; these include: temporal aspects, multivariate, heterogeneous and irregular data, and data sparsity. Materials and methods We propose a new method for EHR data representation called Temporal Tree: a temporal hierarchical representation which, based on temporal co-occurrence, preserves the compound information found at different levels in health data. In addition, this representation is augmented using the doc2vec embedding technique which here is exploited for patient similarity computation. We empirically investigate our proposed method, along with several state-of-the-art benchmarks, on a dataset of real world Intensive Care Unit (ICU) EHRs, for the task of identifying patients with a specific target diagnosis. Results Our empirical results show that the Temporal Trees representation is significantly better than other traditional and state-of-the-art methods for representing patients and computing their similarities. Conclusion Temporal trees capture the temporal relationships between medical, hierarchical data: this enables to effectively model the rich information provided within EHRs and thus the identification of similar patients.

AAAI Conference 2019 Conference Paper

Approximate Kernel Selection with Strong Approximate Consistency

  • Lizhong Ding
  • Yong Liu
  • Shizhong Liao
  • Yu Li
  • Peng Yang
  • Yijie Pan
  • Chao Huang
  • Ling Shao

Kernel selection is fundamental to the generalization performance of kernel-based learning algorithms. Approximate kernel selection is an efficient kernel selection approach that exploits the convergence property of the kernel selection criteria and the computational virtue of kernel matrix approximation. The convergence property is measured by the notion of approximate consistency. For the existing Nyström approximations, whose sampling distributions are independent of the specific learning task at hand, it is difficult to establish the strong approximate consistency. They mainly focus on the quality of the low-rank matrix approximation, rather than the performance of the kernel selection criterion used in conjunction with the approximate matrix. In this paper, we propose a novel Nyström approximate kernel selection algorithm by customizing a criterion-driven adaptive sampling distribution for the Nyström approximation, which adaptively reduces the error between the approximate and accurate criteria. We theoretically derive the strong approximate consistency of the proposed Nyström approximate kernel selection algorithm. Finally, we empirically evaluate the approximate consistency of our algorithm as compared to state-of-the-art methods.

IJCAI Conference 2019 Conference Paper

Learning Network Embedding with Community Structural Information

  • Yu Li
  • Ying Wang
  • Tingting Zhang
  • Jiawei Zhang
  • Yi Chang

Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.

AAAI Conference 2019 Conference Paper

Linear Kernel Tests via Empirical Likelihood for High-Dimensional Data

  • Lizhong Ding
  • Zhi Liu
  • Yu Li
  • Shizhong Liao
  • Yong Liu
  • Peng Yang
  • Ge Yu
  • Ling Shao

We propose a framework for analyzing and comparing distributions without imposing any parametric assumptions via empirical likelihood methods. Our framework is used to study two fundamental statistical test problems: the two-sample test and the goodness-of-fit test. For the two-sample test, we need to determine whether two groups of samples are from different distributions; for the goodness-of-fit test, we examine how likely it is that a set of samples is generated from a known target distribution. Specifically, we propose empirical likelihood ratio (ELR) statistics for the two-sample test and the goodness-of-fit test, both of which are of linear time complexity and show higher power (i. e. , the probability of correctly rejecting the null hypothesis) than the existing linear statistics for high-dimensional data. We prove the nonparametric Wilks’ theorems for the ELR statistics, which illustrate that the limiting distributions of the proposed ELR statistics are chi-square distributions. With these limiting distributions, we can avoid bootstraps or simulations to determine the threshold for rejecting the null hypothesis, which makes the ELR statistics more efficient than the recently proposed linear statistic, finite set Stein discrepancy (FSSD). We also prove the consistency of the ELR statistics, which guarantees that the test power goes to 1 as the number of samples goes to infinity. In addition, we experimentally demonstrate and theoretically analyze that FSSD has poor performance or even fails to test for high-dimensional data. Finally, we conduct a series of experiments to evaluate the performance of our ELR statistics as compared to state-of-the-art linear statistics.

NeurIPS Conference 2019 Conference Paper

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test

  • Lizhong Ding
  • Mengyang Yu
  • Li Liu
  • Fan Zhu
  • Yong Liu
  • Yu Li
  • Ling Shao

Learning the probability distribution of high-dimensional data is a challenging problem. To solve this problem, we formulate a deep energy adversarial network (DEAN), which casts the energy model learned from real data into an optimization of a goodness-of-fit (GOF) test statistic. DEAN can be interpreted as a GOF game between two generative networks, where one explicit generative network learns an energy-based distribution that fits the real data, and the other implicit generative network is trained by minimizing a GOF test statistic between the energy-based distribution and the generated data, such that the underlying distribution of the generated data is close to the energy-based distribution. We design a two-level alternative optimization procedure to train the explicit and implicit generative networks, such that the hyper-parameters can also be automatically learned. Experimental results show that DEAN achieves high quality generations compared to the state-of-the-art approaches.

IROS Conference 2015 Conference Paper

Tripedal walking robot with fixed coxa driven by radially stretchable legs

  • Kentaro Oki
  • Masato Ishikawa
  • Yu Li
  • Naoto Yasutani
  • Koichi Osuka

In this paper, we propose a new kind of tripedal locomotion—namely, robotic walking with three legs. We develop a prototype robot, which we call Martian III, having three stretchable legs fixed to each other at the center of the body. Therefore the coxa (the hip joint) is completely fixed in this robot, and the only control we have are the linear actuators to stretch the three legs in the radial directions. The robot is developed as a minimal model to examine our basic idea, to make the robot swing by actuating the legs and then generate locomotion utilizing geometric effect of rigid body rolling on a flat floor. The experiments with robot succeeds in forwarding and rotary locomotion with appropriate choices of oscillatory along each leg. Further analysis of the movement of Martian III indicates that the direction of the swing leg depends on the deformation of each leg. Moreover, due to the result of the measurement experiment and movement analysis of rotation locomotion, we found out two typical type of gaits.

TCS Journal 2013 Journal Article

A combinatorial 2.375-approximation algorithm for the facility location problem with submodular penalties

  • Yu Li
  • Donglei Du
  • Naihua Xiu
  • Dachuan Xu

We offer the currently best approximation ratio 2. 375 for the facility location problem with submodular penalties (FLPSP), improving not only the previous best combinatorial ratio 3, but also the previous best non-combinatorial ratio 2. 488. We achieve this improved ratio by combining the primal–dual scheme with the greedy augmentation technique.

YNIMG Journal 2007 Journal Article

Phantom calibration method for improved temporal characterization of hemodynamic response in event-related fMRI

  • Yu Li
  • Shahed Reza
  • Mark K. Limkeman

In event-related functional MRI, there exist limits on the time length of the experiments on human subjects and the imaging speed. Due to these limitations, data truncation and undersampling have to be used in functional MRI signal acquisition. The effect of these factors on the hemodynamic deconvolution is investigated experimentally and a phantom calibration method to improve the hemodynamic response is developed. It is observed that the high frequency components generated due to data truncation can fold back into low frequencies when the sampling rate is not sufficiently high. This aliasing can introduce significant noise in hemodynamic deconvolution and can reduce the accuracy of the temporal characterization of hemodynamic response. A SMARTPHANTOM™ BOLD simulator is used to calibrate the aliasing effect in an event-related functional MRI experiment. With the calibration, an anti-aliasing method is used to suppress the aliasing and this resulted in an improved temporal characterization of hemodynamic response in event-related fMRI.