Author name cluster

Yiming Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

45 papers

2 author rows

AAAI Conference 2026 Conference Paper

Learning Better UAV-Based Cross-View Object Geo-Localization from Multi-Modal Prompts: MoP-UAV Benchmark and MoPT Framework

Xiaohan Zhang
Zhangkai Shen
Si-Yuan Cao
Xiaokai Bai
Yiming Li
Zheheng Han
Zhe Wu
Qi Ming

We present MoP-UAV, a new benchmark for UAV-based cross-view object geo-localization guided by multi-modal prompts. MoP-UAV supports fine-grained object-level cross-view localization under diverse prompt modalities, including natural language, bounding boxes, and click points. It offers potential for incorporating large foundation models like large language models (LLMs) and promotes the building of more flexible and intelligent UAV agents. Based on the benchmark, we propose MoPT, a multi-modal-prompt-guided tansformer that embeds prompts as token sequences and extract object location from UAV and satellite features via cross-attention. To enhance semantic consistency and performance, we further adopt a cross-view contrastive loss and propose a RefCOCOg-based pre-training strategy. Extensive experiments show that MoPT achieves robust localization under arbitrary prompt combinations. Notably, multi-modal-prompt training significantly boosts unimodal-prompt inference performance, highlighting the generalization benefits of multi-modal learning. MoPT trained with multi-modal prompts outperforms prior unimodal prompt works under the same setting.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Listening Between the Frames: Bridging Temporal Gaps in Large Audio-Language Models

Hualei Wang
Yiming Li
Shuo Ma
Hong Liu
Xiangdong Wang

Recent Large Audio-Language Models (LALMs) exhibit impressive capabilities in understanding audio content for conversational QA tasks. However, these models struggle to accurately understand timestamps for temporal localization (e.g., Temporal Audio Grounding) and are restricted to short audio perception, leading to constrained capabilities on fine-grained tasks. We identify three key aspects that limit their temporal localization and long audio understanding: (i) timestamp representation, (ii) architecture, and (iii) data. To address this, we introduce TimeAudio, a novel method that empowers LALMs to connect their understanding of audio content with precise temporal perception. Specifically, we incorporate unique temporal markers to improve time-sensitive reasoning and apply an absolute time-aware encoding that explicitly grounds the acoustic features with absolute time information. Moreover, to realize end-to-end long audio understanding, we introduce a segment-level token merging module to substantially reduce audio token redundancy and enhance the efficiency of information extraction. Due to the lack of suitable datasets and evaluation metrics, we consolidate existing audio datasets into a new dataset focused on temporal tasks and establish a series of metrics to evaluate the fine-grained performance. Evaluations show strong performance across a variety of fine-grained tasks, such as dense captioning, temporal grounding, and timeline speech summarization, which demonstrates TimeAudio's robust temporal localization and reasoning capabilities.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model

Xuankun Rong
Wenke Huang
Wenzheng Jiang
Yiming Li
Wenxuan Wang
Mang Ye

The massive scale of data and computation required for training Multimodal Large Language Models (MLLMs) has fueled the rise of Fine-Tuning as a Service (FTaaS), enabling users to rapidly customize models for diverse real-world tasks. While FTaaS democratizes access to advanced multimodal intelligence, it also introduces serious security concerns, particularly backdoor attacks. In this work, we systematically analyze backdoor vulnerabilities in MLLMs under the FTaaS paradigm, revealing two key phenomena: (1) markedly reduced sensitivity to textual variations when a visual trigger is present, and (2) abnormally stable model confidence even under strong semantic perturbations. Building on these insights, we propose Trap on Text (ToT), a novel inference-time backdoor detection framework. ToT applies controlled semantic perturbations to textual prompts and jointly analyzes the semantic consistency and confidence drift of the model’s responses, enabling robust detection of backdoor activations without requiring model parameters, architectures or clean reference data. Extensive experiments across architectures and datasets show that ToT achieves strong attack mitigation and preserves clean accuracy, offering a practical solution for safeguarding FTaaS workflows.

PDF Details DOI

JBHI Journal 2026 Journal Article

Remote PPG Measurement Using a Synergistic Time-Frequency Network

Yiming Li
Qinglin He
Yihan Yang
Yuguang Chu
Yuanhui Hu
Zhe Wu
Xiaokai Bai
Xiaohan Zhang

Remote photoplethysmography (rPPG) aims to estimate the blood volume pulse (BVP) signal from facial videos. Existing rPPG approaches still suffer from limitations. We attribute this issue to two primary problems: (1) the reliance solely on time-domain processing that makes the signal susceptible to interference, and (2) the presence of a phase discrepancy between the supervision signal and the ground-truth PPG. To address these problems, we propose TFSNet, a novel time-frequency synergy network for rPPG signal estimation and heart rate prediction. Specifically, we leverage time-frequency fusion (TFF) module, which integrates frequency-domain information into the learning process to enrich the feature representations. Additionally, we introduce the amplitude-phase decoupling (APD) module, which apply phase compensation in frequency domain to mitigate the adverse effects of incorrect phase supervision. Extensive experiments demonstrate that TFSNet achieves state-of-the-art performance, significantly outperforming current approaches in both accuracy and robustness.

Details DOI

EAAI Journal 2026 Journal Article

Text-based three-dimensional geometric person retrieval

Fanzhi Jiang
Kexin Wang
Hanchi Ren
Yiming Li
Liumei Zhang
Yuanjiao Hu
Xianghua Xie
Su Yang

Person Re-identification (Re-ID) is crucial in computer vision, widely applied in forensic investigation, intelligent surveillance, and video retrieval. Recent text-based Re-ID methods leverage eyewitness descriptions to enhance retrieval flexibility but still face challenges in accurately characterizing individuals under complex conditions. To address issues like low resolution, viewpoint variations, and occlusions, this paper proposes a novel text-based person Re-ID approach that integrates textual descriptions with synthesized Three-dimensional (3D) geometric pedestrian data derived from existing Two-dimensional (2D) images. Specifically, the semantic richness of text compensates for the lack of color and texture details in 3D data, while the robustness of geometric and pose information significantly enhances retrieval performance. Despite current 3D pedestrian data being generated through reconstruction algorithms, this work serves as a pioneering exploration of text-to-3D pedestrian retrieval, offering substantial potential for real-world applications in multimodal biometrics, forensic investigations, and privacy protection. Experiments on three public datasets demonstrate that our method achieves competitive performance, confirming its practical applicability and significance.

Details DOI

TMLR Journal 2025 Journal Article

A Survey on Large Language Model Acceleration based on KV Cache Management

Haoyang Li
Yiming Li
Anxin Tian
Tianhao Tang
Zhanchao Xu
Xuejia Chen
Nicole HU
Wei Dong

Large Language Models (LLMs) have revolutionized a wide range of domains such as natural language processing, computer vision, and multi-modal tasks due to their ability to comprehend context and perform logical reasoning. However, the computational and memory demands of LLMs, particularly during inference, pose significant challenges when scaling them to real-world, long-context, and real-time applications. Key-Value (KV) cache management has emerged as a critical optimization technique for accelerating LLM inference by reducing redundant computations and improving memory utilization. This survey provides a comprehensive overview of KV cache management strategies for LLM acceleration, categorizing them into token-level, model-level, and system-level optimizations. Token-level strategies include KV cache selection, budget allocation, merging, quantization, and low-rank decomposition, while model-level optimizations focus on architectural innovations and attention mechanisms to enhance KV reuse. System-level approaches address memory management, scheduling, and hardware-aware designs to improve efficiency across diverse computing environments. Additionally, the survey provides an overview of both text and multimodal datasets and benchmarks used to evaluate these strategies. By presenting detailed taxonomies and comparative analyses, this work aims to offer useful insights for researchers and practitioners to support the development of efficient and scalable KV cache management techniques, contributing to the practical deployment of LLMs in real-world applications.

PDF Details

NeurIPS Conference 2025 Conference Paper

Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference

Zijie Geng
Jie Wang
Ziqi Liu
Feng Ju
Yiming Li
Xing Li
Mingxuan Yuan
Jianye Hao

Key-Value (KV) cache eviction---which retains the KV pairs of the most important tokens while discarding less important ones---is a critical technique for optimizing both memory usage and inference latency in large language models (LLMs). However, existing approaches often rely on simple heuristics---such as attention weights---to measure token importance, overlooking the spatial relationships between token value states in the vector space. This often leads to suboptimal token selections and thus performance degradation. To tackle this problem, we propose a novel method, namely **AnDPro** (**An**chor **D**irection **Pro**jection), which introduces a projection-based scoring function to more accurately measure token importance. Specifically, AnDPro operates in the space of value vectors and leverages the projections of these vectors onto an *``Anchor Direction''*---the direction of the pre-eviction output---to measure token importance and guide more accurate token selection. Experiments on $16$ datasets from the LongBench benchmark demonstrate that AnDPro can maintain $96. 07\\%$ of the full cache accuracy using only $3. 44\\%$ KV cache budget, reducing KV cache budget size by $46. 0\\%$ without compromising quality compared to previous state-of-the-arts.

PDF Details

NeurIPS Conference 2025 Conference Paper

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia
Sensen Gao
Simeng Qin
Tianyu Pang
Chao Du
Yihao Huang
Xinfeng Li
Yiming Li

Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features—such as CLIP’s [CLS] token—between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly for closed-source models. To address this limitation, we propose a targeted transferable adversarial attack method based on feature optimal alignment, called FOA-Attack, to improve adversarial transfer capability. Specifically, at the global level, we introduce a global feature loss based on cosine similarity to align the coarse-grained features of adversarial samples with those of target samples. At the local level, given the rich local representations within Transformers, we leverage clustering techniques to extract compact local patterns to alleviate redundant local features. We then formulate local feature alignment between adversarial and target samples as an optimal transport (OT) problem and propose a local clustering optimal transport loss to refine fine-grained feature alignment. Additionally, we propose a dynamic ensemble model weighting strategy to adaptively balance the influence of multiple models during adversarial example generation, thereby further improving transferability. Extensive experiments across various models demonstrate the superiority of the proposed method, outperforming state-of-the-art methods, especially in transferring to closed-source MLLMs.

PDF Details

EAAI Journal 2025 Journal Article

Algorithm for surface flow velocity measurement in trunk canal based on improved YOLOv8 and DeepSORT

Yuhui Zhou
Xiaojie Wu
Yiming Li
Huimin Sun
Di Fan

The velocity measurement of trunk canal and river plays an important role in agriculture and forestry irrigation scheduling, water resources management and flood prediction. Particle flow measurement technology can realize non-contact and high-precision flow measurement, but in practical application, the particle size is small, the shape is different and the dynamic change brings great challenges to the application of this method. To solve these problems, this paper proposed the surface velocity measurement method of trunk canal based on improved YOLOv8(You Only Look Once Version 8) and DeepSORT(Deep Simple Online and Realtime Tracking), and introduced tiny detection layer and channel attention mechanism to improve YOLOv8's detection capability of small targets. In DeepSORT, IBN-Net(Intent-Based Networking-Network) network structure and GIoU(Generalized Intersection over Union) matching are introduced to solve the problem of discontinuity or loss of target tracking in complex cases, which improves the accuracy and robustness of target tracking. The experimental results show that the improved YOLOv8 improves AP(Average Precision) and mAP(mean Average Precision) by nearly 5% and 0. 2% respectively. The performance of the improved DeepSORT has been improved across the board, especially IDP and MOTA, which have improved by 25. 2% and 5. 6% respectively. The algorithm also has good accuracy in actual velocity measurement.

Details DOI

NeurIPS Conference 2025 Conference Paper

Backdoor Cleaning without External Guidance in MLLM Fine-tuning

Xuankun Rong
Wenke Huang
Jian Liang
Jinhe Bi
Xun Xiao
Yiming Li
Bo Du
Mang Ye

Multimodal Large Language Models (MLLMs) are increasingly deployed in fine-tuning-as-a-service (FTaaS) settings, where user-submitted datasets adapt general-purpose models to downstream tasks. This flexibility, however, introduces serious security risks, as malicious fine-tuning can implant backdoors into MLLMs with minimal effort. In this paper, we observe that backdoor triggers systematically disrupt cross-modal processing by causing abnormal attention concentration on non-semantic regions—a phenomenon we term attention collapse. Based on this insight, we propose Believe Your Eyes (BYE), a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples. BYE operates via a three-stage pipeline: (1) extracting attention maps using the fine-tuned model, (2) computing entropy scores and profiling sensitive layers via bimodal separation, and (3) performing unsupervised clustering to remove suspicious samples. Unlike prior defenses, BYE equires no clean supervision, auxiliary labels, or model modifications. Extensive experiments across various datasets, models, and diverse trigger types validate BYE's effectiveness: it achieves near-zero attack success rates while maintaining clean-task performance, offering a robust and generalizable solution against backdoor threats in MLLMs.

PDF Details

JBHI Journal 2025 Journal Article

CellCircLoc: Deep Neural Network for Predicting and Explaining Cell Line-Specific CircRNA Subcellular Localization

Min Zeng
Jingwei Lu
Yiming Li
Chengqian Lu
Shichao Kan
Fei Guo
Min Li

The subcellular localization of circular RNAs (circRNAs) is crucial for understanding their functional relevance and regulatory mechanisms. CircRNA subcellular localization exhibits variations across different cell lines, demonstrating the diversity and complexity of circRNA regulation within distinct cellular contexts. However, existing computational methods for predicting circRNA subcellular localization often ignore the importance of cell line specificity and instead train a general model on aggregated data from all cell lines. Considering the diversity and context-dependent behavior of circRNAs across different cell lines, it is imperative to develop cell line-specific models to accurately predict circRNA subcellular localization. In the study, we proposed CellCircLoc, a sequence-based deep learning model for circRNA subcellular localization prediction, which is trained for different cell lines. CellCircLoc utilizes a combination of convolutional neural networks, Transformer blocks, and bidirectional long short-term memory to capture both sequence local features and long-range dependencies within the sequences. In the Transformer blocks, CellCircLoc uses an attentive convolution mechanism to capture the importance of individual nucleotides. Extensive experiments demonstrate the effectiveness of CellCircLoc in accurately predicting circRNA subcellular localization across different cell lines, outperforming other computational models that do not consider cell line specificity. Moreover, the interpretability of CellCircLoc facilitates the discovery of important motifs associated with circRNA subcellular localization.

Details DOI

IROS Conference 2025 Conference Paper

Cockroach's Turning Strategy Enhanced Hexapod Robot with Flexible Torso

Yiming Li
Xingyu Li
Jie Zhou
Chenfeng Xie
Yao Li
Bing Li

The design and control of hexapod robots have become an active research field due to the ability to achieve adaptive and stable multi-terrain locomotion. However, existing hexapod robots focus on the integration of flexible pitch joints to enhance their obstacle-crossing and slope-climbing abilities, and few biological observations have been made to gain insight into the agile steering mechanisms of hexapod insects. Herein, we observed the steering movements of Madagascar cockroaches. Observations showed that cockroaches exhibited specific phase relationships in addition to regular tripod gait pattern during steering. Moreover, we also found that a smaller steering radius resulted in a larger lateral bending angle of the thoracic segments. Inspired by this, a hexapod robot with a flexible torso (F-RHex) was designed and fabricated. Bio-inspired gait patterns were abstracted and simplified into two steering strategies: gait-based and mix-based. Compared to the purely gait-based strategy, the F-RHex testing results demonstrated a ~27. 4% reduction in turning radius and ~40% enhancement in steering velocity, implying that the mix-based strategy offers superior steering capability.

Details

IJCAI Conference 2025 Conference Paper

Graph Prompts: Adapting Video Graph for Video Question Answering

Yiming Li
Xiaoshan Yang
Bing-Kun Bao
Changsheng Xu

Due to the dynamic nature in videos, it is evident that perceiving and reasoning about temporal information are the key focus of Video Question Answering (VideoQA). In recent years, several methods have explored relationship-level temporal modeling with graph-structured video representation. Unfortunately, these methods heavily rely on the question text, thus making it challenging to perceive and reason about video content that is not explicitly mentioned in the question. To address the above challenge, we propose Graph Prompts-based VideoQA (GP-VQA), which adopts a video-based graph structure for enhanced video understanding. The proposed GP-VQA contains two stages, i. e. , pre-training and prompt tuning. In pre-training, we define the pretext task that requires GP-VQA to reason about the randomly masked nodes or edges in the video graph, thus prompting GP-VQA to learn the reasoning ability with video-guided information. In prompt-tuning, we organize the textual question into question graph and implement message passing from video graph to question graph, therefore inheriting the video-based reasoning ability from video graph completion to VideoQA. Extensive experiments on various datasets have demonstrated the promising performance of GP-VQA.

PDF Details DOI

ICML Conference 2025 Conference Paper

KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference

Xing Li 0023
Zeyu Xing 0002
Yiming Li
Linping Qu
Hui-Ling Zhen
Yiwu Yao
Wulong Liu
Sinno Jialin Pan

KV cache quantization can improve Large Language Models (LLMs) inference throughput and latency in long contexts and large batch-size scenarios while preserving LLMs effectiveness. However, current methods have three unsolved issues: overlooking layer-wise sensitivity to KV cache quantization, high overhead of online fine-grained decision-making, and low flexibility to different LLMs and constraints. Therefore, we theoretically analyze the inherent correlation of layer-wise transformer attention patterns to KV cache quantization errors and study why key cache is generally more important than value cache for quantization error reduction. We further propose a simple yet effective framework KVTuner to adaptively search for the optimal hardware-friendly layer-wise KV quantization precision pairs for coarse-grained KV cache with multi-objective optimization and directly utilize the offline searched configurations during online inference. To reduce the computational cost of offline calibration, we utilize the intra-layer KV precision pair pruning and inter-layer clustering to reduce the search space. Experimental results show that we can achieve nearly lossless 3. 25-bit mixed precision KV cache quantization for LLMs like Llama-3. 1-8B-Instruct and 4. 0-bit for sensitive models like Qwen2. 5-7B-Instruct on mathematical reasoning tasks. The maximum inference throughput can be improved by 21. 25% compared with KIVI-KV8 quantization over various context lengths. Our code and searched configurations are available at https: //github. com/cmd2001/KVTuner.

Details

ECAI Conference 2025 Conference Paper

Personalized Subgraph Federated Learning with Sheaf Collaboration

Wenfei Liang 0001
Yanan Zhao 0003
Rui She 0001
Yiming Li
Wee Peng Tay

Graph-structured data is prevalent in many applications. In subgraph federated learning (FL), this data is distributed across clients, each with a local subgraph. Personalized subgraph FL aims to develop a customized model for each client to handle diverse data distributions. However, performance variation across clients remains a key issue due to the heterogeneity of local subgraphs. To overcome the challenge, we propose FedSheafHN, a novel framework built on a sheaf collaboration mechanism to unify enhanced client descriptors with efficient personalized model generation. Specifically, FedSheafHN embeds each client’s local subgraph into a server-constructed collaboration graph by leveraging graph-level embeddings and employing sheaf diffusion within the collaboration graph to enrich client representations. Subsequently, FedSheafHN generates customized client models via a server-optimized hypernetwork. Empirical evaluations demonstrate that FedSheafHN outperforms existing personalized subgraph FL methods on various graph datasets. Additionally, it exhibits fast model convergence and effectively generalizes to new clients.

Details

NeurIPS Conference 2025 Conference Paper

Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack

Yukun Chen
Boheng Li
Yu Yuan
Leyi Qi
Yiming Li
Tianwei Zhang
Zhan Qin
Kui Ren

Knowledge distillation (KD) is a vital technique for deploying deep neural networks (DNNs) on resource-constrained devices by transferring knowledge from large teacher models to lightweight student models. While teacher models from third-party platforms may undergo security verification (e. g. , backdoor detection), we uncover a novel and critical threat: distillation-conditional backdoor attacks (DCBAs). DCBA injects dormant and undetectable backdoors into teacher models, which become activated in student models via the KD process, even with clean distillation datasets. While the direct extension of existing methods is ineffective for DCBA, we implement this attack by formulating it as a bilevel optimization problem and proposing a simple yet effective method (i. e. , SCAR). Specifically, the inner optimization simulates the KD process by optimizing a surrogate student model, while the outer optimization leverages outputs from this surrogate to optimize the teacher model for implanting the conditional backdoor. Our SCAR addresses this complex optimization utilizing an implicit differentiation algorithm with a pre-optimized trigger injection function. Extensive experiments across diverse datasets, model architectures, and KD techniques validate the effectiveness of our SCAR and its resistance against existing backdoor detection, highlighting a significant yet previously overlooked vulnerability in the KD process. Our code is available at https: //github. com/WhitolfChen/SCAR.

PDF Details

EAAI Journal 2025 Journal Article

The hybrid velocity prediction model for pipeline detection based on bidirectional long short-term memory and an improved attention mechanism

Junjie Ma
Yiming Li
Zhongchao Zhang
Tongshan Liu
Guiqiu Song

Speed prediction of pipeline detectors is crucial for accurate pipeline positioning and defect detection. This paper proposes a novel hybrid prediction model for this purpose, combining dual-layer Bidirectional Long Short-Term Memory, a bidirectional input attention mechanism, Singular Spectrum Analysis, and Bayesian Optimization. The dual-layer Bidirectional Long Short-Term Memory captures both forward and backward information in time series for prediction. The attention mechanism assigns weights to multiple input features. Singular Spectrum Analysis reconstructs and extracts features from time series data, while Bayesian Optimization is used to obtain the optimal hyperparameters for the Singular Spectrum Analysis and Bidirectional Long Short-Term Memory algorithms. A pipeline experimental platform was constructed to conduct comparative tests of the proposed model under constant speed, variable speed, non-lubricated, and lubricated conditions, assessing both operation and prediction. The results indicate that, compared to the baseline model, the hybrid prediction model proposed in this paper achieves improvements of over 9 % in Root Mean Square Error and over 0. 7 % in R-Square under the most severe variable-speed conditions, demonstrating superior performance in prediction accuracy and generalization capability.

Details DOI

NeurIPS Conference 2025 Conference Paper

Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning

Boheng Li
Renjie Gu
Junjie Wang
Leyi Qi
Yiming Li
Run Wang
Zhan Qin
Tianwei Zhang

Text-to-image (T2I) diffusion models have achieved impressive image generation quality and are increasingly fine-tuned for personalized applications. However, these models often inherit unsafe behaviors from toxic pretraining data, raising growing safety concerns. While recent safety-driven unlearning methods have made promising progress in suppressing model toxicity, they are found to be fragile to downstream fine-tuning, as we reveal that state-of-the-art methods largely fail to retain their effectiveness even when fine-tuned on entirely benign datasets. To mitigate this problem, in this paper, we propose ResAlign, a safety-driven unlearning framework with enhanced resilience against downstream fine-tuning. By modeling downstream fine-tuning as an implicit optimization problem with a Moreau envelope-based reformulation, ResAlign enables efficient gradient estimation to minimize the recovery of harmful behaviors. Additionally, a meta-learning strategy is proposed to simulate a diverse distribution of fine-tuning scenarios to improve generalization. Extensive experiments across a wide range of datasets, fine-tuning methods, and configurations demonstrate that ResAlign consistently outperforms prior unlearning approaches in retaining safety, while effectively preserving benign generation capability. Our code and pretrained models are publicly available at https: //github. com/AntigoneRandy/ResAlign.

PDF Details

EAAI Journal 2025 Journal Article

Transfer learning approach to modeling multichannel gate-all-around nanosheet field-effect transistors under work function fluctuations

Sagarika Dash
Yiming Li

Deep learning (DL) has significantly advanced various industries, including semiconductors, by providing sophisticated methods for analyzing emerging device data. Transfer learning (TL), a prominent DL topology, leverages knowledge from a pre-trained model to improve the performance of a specific target task. This study aims to apply TL to predict the impact of work function fluctuation (WKF) on the variability of threshold voltage ( σ V T H ) in gate-all-around silicon nanosheet field-effect transistors (GAA Si NS FETs). We used TL to transfer knowledge from models trained on 1-channel (1ch) GAA Si NS FETs to more complex 4-channel (4ch) counterparts. The prediction accuracy is evaluated by using the mean square error( M S E ), root mean square error ( R M S E ), and the mean absolute error( M A E ) metrics. Our TL approach achieved an error rate of less than 1%, demonstrating high accuracy in predicting WKF effects on σ V T H of 4ch GAA Si NS FETs using only 500 samples. This study presents a pioneering application of TL using a one-dimensional (1D) convolutional neural network (CNN) long short-term memory (LSTM) topology in semiconductor device technology. The results effectively address the challenge of predicting σ V T H variations in advanced GAA Si NS FETs because of WKF, which eliminates the need for extensive semiconductor device simulations.

Details DOI

ICLR Conference 2024 Conference Paper

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Tinghao Xie
Xiangyu Qi
Ping He
Yiming Li
Jiachen T. Wang
Prateek Mittal

We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract **backdoor functionality** of a given backdoored model to a *backdoor expert* model. The approach is straightforward --- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model~(dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, **BaDExpert** (**Ba**ckdoor Input **D**etection with Backdoor **Expert**), effectively mitigates 17 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer). Our code is integrated into our research toolbox: [https://github.com/vtu81/backdoor-toolbox](https://github.com/vtu81/backdoor-toolbox).

Details

IROS Conference 2024 Conference Paper

Calibration-Free Vision-Assisted Container Loading of RTG Cranes

Jianbing Yang
Yuanzhe Wang
Hao Jiang
Bin Zhao
Yiming Li
Danwei Wang

Vision-assisted container loading of Rubber Tyred Gantry (RTG) cranes are facing two primary challenges. Firstly, the uncertainty inherent in Covolutional Neural Network (CNN) based detection hinders its direct application in the safety-critical operation of such heavy-duty machinery. Secondly, sensor calibration introduces additional complexities and errors into the system. However, existing studies have not adequately addressed these challenges. Motivated by this gap, this paper proposes an integrated approach for target detection and alignment control in container loading of RTG cranes. To ensure reliable target marker identification, a heuristic post-processing algorithm is developed as a complement to CNN-based foreground segmentation, thereby ensuring safety during the container handling process. On this basis, a pixel-based control scheme is designed to align the container with the target markers, which eliminates the need for offline or online sensor calibrations. The proposed approach has been successfully implemented on a real RTG crane manufactured by Shanghai Zhenhua Heavy Industries Co. , Ltd. (ZPMC) and validated at the Port of Ningbo, China. Experimental results demonstrate the superiority of the proposed approach over current manual operations in port industries, highlighting its potential for crane automation.

Details

IJCAI Conference 2024 Conference Paper

Defending Against Backdoor Attacks by Layer-wise Feature Analysis (Extended Abstract)

Najeeb Moharram Jebreel
Josep Domingo-Ferrer
Yiming Li

Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i. e. , backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer. Extensive experiments on two benchmark datasets are reported which confirm the effectiveness of our defense.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Learning Realistic and Reasonable Grasps for Anthropomorphic Hand in Cluttered Scenes

Haonan Duan 0001
Yiming Li
Daheng Li
Wei Wei 0062
Yayu Huang
Peng Wang 0024

Grasping is one of the most fundamental skills for humans to interact with objects. However, it remains a challenging problem for anthropomorphic hands, due to the lack of object affordance understanding and high-dimensional grasp planning. In this work, we propose an anthropomorphic hand grasping framework to learn realistic and reasonable grasps in cluttered scenes, which tackles the problem in three items: 1) graspable point segmentation; 2) hand grasp generation and 3) grasp optimization. Specifically, our method generates high-quality hand grasps efficiently without complete object models by learning graspable points, associated grasp configurations from observed point cloud in a parallel manner and optimizing predicted grasps based on hand-object contacts. Simulation experiments show that our model generates physical plausible grasps for the anthropomorphic hand effectively with over 70% success rate. Real-world experiments demonstrate that the model trained in simulation performs satisfactorily in real-world scenarios for unseen objects.

Details

NeurIPS Conference 2024 Conference Paper

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Yiming Li
Zehong Wang
Yue Wang
Zhiding Yu
Zan Gojcic
Marco Pavone
Chen Feng
Jose M. Alvarez

Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust 3D representation learning problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residual mining, and robust optimization, 3DGM simultaneously performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Multi-Region Text-Driven Manipulation of Diffusion Imagery

Yiming Li
Peng Zhou
Jun Sun
Yi Xu

Text-guided image manipulation has attracted significant attention recently. Prevailing techniques concentrate on image attribute editing for individual objects, however, encountering challenges when it comes to multi-object editing. The main reason is the lack of consistency constraints on the spatial layout. This work presents a multi-region guided image manipulation framework, enabling manipulation through region-level textual prompts. With MultiDiffusion as a baseline, we are dedicated to the automatic generation of a rational multi-object spatial distribution, where disparate regions are fused as a unified entity. To mitigate interference from regional fusion, we employ an off-the-shelf model (CLIP) to impose region-aware spatial guidance on multi-object manipulation. Moreover, when applied to the StableDiffusion, the presence of quality-related yet object-agnostic lengthy words hampers the manipulation. To ensure focus on meaningful object-specific words for efficient guidance and generation, we introduce a keyword selection method. Furthermore, we demonstrate a downstream application of our method for multi-region inversion, which is tailored for manipulating multiple objects in real images. Our approach, compatible with variants of Stable Diffusion models, is readily applicable for manipulating diverse objects in extensive images with high-quality generation, showing superb image control capabilities. Code is available at https://github.com/liyiming09/multi-region-guided-diffusion.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

Fangqiang Ding
Xiangyu Wen
Yunzhou Zhu
Yiming Li
Chris Xiaoxuan Lu

3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment of self-driving cars. To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. Our method, RadarOcc, circumvents the limitations of sparse radar point clouds by directly processing the 4D radar tensor, thus preserving essential scene details. RadarOcc innovatively addresses the challenges associated with the voluminous and noisy 4D radar data by employing Doppler bins descriptors, sidelobe-aware spatial sparsification, and range-wise self-attention mechanisms. To minimize the interpolation errors associated with direct coordinate transformations, we also devise a spherical-based feature encoding followed by spherical-to-Cartesian feature aggregation. We benchmark various baseline methods based on distinct modalities on the public K-Radar dataset. The results demonstrate RadarOcc's state-of-the-art performance in radar-based 3D occupancy prediction and promising results even when compared with LiDAR- or camera-based methods. Additionally, we present qualitative evidence of the superior performance of 4D radar in adverse weather conditions and explore the impact of key pipeline components through ablation studies.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Representing Robot Geometry as Distance Fields: Applications to Whole-body Manipulation

Yiming Li
Yan Zhang
Amirreza Razmjoo
Sylvain Calinon

In this work, we propose a novel approach to represent robot geometry as distance fields (RDF) that extends the principle of signed distance fields (SDFs) to articulated kinematic chains. Our method employs a combination of Bernstein polynomials to encode the signed distance for each robot link with high accuracy and efficiency while ensuring the mathematical continuity and differentiability of SDFs. We further leverage the kinematics chain of the robot to produce the SDF representation in joint space, allowing robust distance queries in arbitrary joint configurations. The proposed RDF representation is differentiable and smooth in both task and joint spaces, enabling its direct integration to optimization problems. Additionally, the 0-level set of the robot corresponds to the robot surface, which can be seamlessly integrated into whole-body manipulation tasks. We conduct various experiments in both simulations and with 7-axis Franka Emika robots, comparing against baseline methods, and demonstrating its effectiveness in collision avoidance and whole-body manipulation tasks. Project page: https://sites.google.com/view/lrdf/home

Details

ICRA Conference 2024 Conference Paper

Robust Collaborative Perception against Temporal Information Disturbance

Xunjie He
Yiming Li
Te Cui
Meiling Wang
Tong Liu
Yufeng Yue

Collaborative perception facilitates a more comprehensive representation of the environment by leveraging complementary information shared among various agents and sensors. However, practical applications often encounter information disturbance which includes perception packet loss and time delays, and a comprehensive framework that can simultaneously address such issues is absent. In addition, the feature extraction process prior to fusion is not sufficient, as it lacks exploration of the local semantics and context dependencies of individual features. To enhance both accuracy and robustness, this paper introduces a novel framework named Robust Collaborative Perception against Temporal Information Disturbance, which predicts perception information when disturbance occurs. Specifically, the Historical Frame Prediction (HFP) module is introduced to make compensation for information loss with temporal association excavation of historical features. Based on the predicted features generated by the HFP module, the Pyramid Attention Integration (PAI) module is introduced to augment local semantics and incorporate global long-range dependencies through multi-scale window attention. Compared with existing methods on the publicly available dataset OPV2V, our approach exhibits superior performance and expanded robustness in the 3D object detection task. The code will be publicly available at https://github.com/hexunjie/Ro-temd.

Details

NeurIPS Conference 2024 Conference Paper

ZeroMark: Towards Dataset Ownership Verification without Disclosing Watermark

Junfeng Guo
Yiming Li
Ruibo Chen
Yihan Wu
Chenxi Liu
Heng Huang

High-quality public datasets significantly prompt the prosperity of deep neural networks (DNNs). Currently, dataset ownership verification (DOV), which consists of dataset watermarking and ownership verification, is the only feasible solution to protect their copyright by preventing unauthorized use. In this paper, we revisit existing DOV methods and find that they all mainly focused on the first stage by designing different types of dataset watermarks and directly exploiting watermarked samples as the verification samples for ownership verification. As such, their success relies on an underlying assumption that verification is a \emph{one-time} and \emph{privacy-preserving} process, which does not necessarily hold in practice. To alleviate this problem, we propose \emph{ZeroMark} to conduct ownership verification without disclosing dataset-specified watermarks. Our method is inspired by our empirical and theoretical findings of the intrinsic property of DNNs trained on the watermarked dataset. Specifically, ZeroMark first generates the closest boundary version of given benign samples and calculates their boundary gradients under the label-only black-box setting. After that, it examines whether the given suspicious method has been trained on the protected dataset by performing a hypothesis test, based on the cosine similarity measured on the boundary gradients and the watermark pattern. Extensive experiments on benchmark datasets verify the effectiveness of our ZeroMark and its resistance to potential adaptive attacks. The codes for reproducing our main experiments are publicly available at \href{https: //github. com/JunfengGo/ZeroMark. git}{GitHub}.

PDF Details DOI

YNICL Journal 2023 Journal Article

A presurgical voxel-wise predictive model for cerebellar mutism syndrome in children with posterior fossa tumors

Wei Yang
Yiming Li
Zesheng Ying
Yingjie Cai
Xiaojiao Peng
HaiLang Sun
Jiashu Chen
Kaiyi Zhu

BACKGROUND: This study aimed to investigate cerebellar mutism syndrome (CMS)-related voxels and build a voxel-wise predictive model for CMS. METHODS: From July 2013 to January 2022, 188 pediatric patients diagnosed with posterior fossa tumor were included in this study, including 38 from a prospective cohort recruited between 2020 and January 2022, and the remaining from a retrospective cohort recruited in July 2013-Aug 2020. The retrospective cohort was divided into the training and validation sets; the prospective cohort served as a prospective validation set. Voxel-based lesion symptoms were assessed to identify voxels related to CMS, and a predictive model was constructed and tested in the validation and prospective validation sets. RESULTS: No significant differences were detected among these three data sets in CMS rate, gender, age, tumor size, tumor consistency, presence of hydrocephalus and paraventricular edema. Voxels related to CMS were mainly located in bilateral superior and inferior cerebellar peduncles and the superior part of the cerebellum. The areas under the curves for the model in the training, validation and prospective validation sets were 0.889, 0.784 and 0.791, respectively. CONCLUSIONS: Superior and inferior cerebellar peduncles and the superior part of the cerebellum were related to CMS, especially the right side, and voxel-based lesion-symptom analysis could provide valuable predictive information before surgery.

Details DOI

NeurIPS Conference 2023 Conference Paper

Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand

Junfeng Guo
Yiming Li
Lixu Wang
Shu-Tao Xia
Heng Huang
Cong Liu
Bo Li

The prosperity of deep neural networks (DNNs) is largely benefited from open-source datasets, based on which users can evaluate and improve their methods. In this paper, we revisit backdoor-based dataset ownership verification (DOV), which is currently the only feasible approach to protect the copyright of open-source datasets. We reveal that these methods are fundamentally harmful given that they could introduce malicious misclassification behaviors to watermarked DNNs by the adversaries. In this paper, we design DOV from another perspective by making watermarked models (trained on the protected dataset) correctly classify some `hard' samples that will be misclassified by the benign model. Our method is inspired by the generalization property of DNNs, where we find a \emph{hardly-generalized domain} for the original dataset (as its \emph{domain watermark}). It can be easily learned with the protected dataset containing modified samples. Specifically, we formulate the domain generation as a bi-level optimization and propose to optimize a set of visually-indistinguishable clean-label modified data with similar effects to domain-watermarked samples from the hardly-generalized domain to ensure watermark stealthiness. We also design a hypothesis-test-guided ownership verification via our domain watermark and provide the theoretical analyses of our method. Extensive experiments on three benchmark datasets are conducted, which verify the effectiveness of our method and its resistance to potential adaptive methods.

PDF Details

AAAI Conference 2023 Conference Paper

Generating Transferable 3D Adversarial Point Cloud via Random Perturbation Factorization

Bangyan He
Jian Liu
Yiming Li
Siyuan Liang
Jingzhi Li
Xiaojun Jia
Xiaochun Cao

Recent studies have demonstrated that existing deep neural networks (DNNs) on 3D point clouds are vulnerable to adversarial examples, especially under the white-box settings where the adversaries have access to model parameters. However, adversarial 3D point clouds generated by existing white-box methods have limited transferability across different DNN architectures. They have only minor threats in real-world scenarios under the black-box settings where the adversaries can only query the deployed victim model. In this paper, we revisit the transferability of adversarial 3D point clouds. We observe that an adversarial perturbation can be randomly factorized into two sub-perturbations, which are also likely to be adversarial perturbations. It motivates us to consider the effects of the perturbation and its sub-perturbations simultaneously to increase the transferability for sub-perturbations also contain helpful information. In this paper, we propose a simple yet effective attack method to generate more transferable adversarial 3D point clouds. Specifically, rather than simply optimizing the loss of perturbation alone, we combine it with its random factorization. We conduct experiments on benchmark dataset, verifying our method's effectiveness in increasing transferability while preserving high efficiency.

PDF Details DOI

AAAI Conference 2023 Short Paper

Poisoning-Based Backdoor Attacks in Computer Vision

Yiming Li

Recent studies demonstrated that the training process of deep neural networks (DNNs) is vulnerable to backdoor attacks if third-party training resources (e.g., samples) are adopted. Specifically, the adversaries intend to embed hidden backdoors into DNNs, where the backdoor can be activated by pre-defined trigger patterns and leading malicious model predictions. My dissertation focuses on poisoning-based backdoor attacks in computer vision. Firstly, I study and propose more stealthy and effective attacks against image classification tasks in both physical and digital spaces. Secondly, I reveal the backdoor threats in visual object tracking, which is representative of critical video-related tasks. Thirdly, I explore how to exploit backdoor attacks as watermark techniques for positive purposes. I design a Python toolbox (i.e., BackdoorBox) that implements representative and advanced backdoor attacks and defenses under a unified and flexible framework, based on which to provide a comprehensive benchmark of existing methods at the end.

PDF Details DOI

ICLR Conference 2023 Conference Paper

Revisiting the Assumption of Latent Separability for Backdoor Defenses

Xiangyu Qi
Tinghao Xie
Yiming Li
Saeed Mahloujifar
Prateek Mittal

Recent studies revealed that deep learning is susceptible to backdoor poisoning attacks. An adversary can embed a hidden backdoor into a model to manipulate its predictions by only modifying a few training data, without controlling the training process. Currently, a tangible signature has been widely observed across a diverse set of backdoor poisoning attacks --- models trained on a poisoned dataset tend to learn separable latent representations for poison and clean samples. This latent separation is so pervasive that a family of backdoor defenses directly take it as a default assumption (dubbed latent separability assumption), based on which to identify poison samples via cluster analysis in the latent space. An intriguing question consequently follows: is the latent separation unavoidable for backdoor poisoning attacks? This question is central to understanding whether the assumption of latent separability provides a reliable foundation for defending against backdoor poisoning attacks. In this paper, we design adaptive backdoor poisoning attacks to present counter-examples against this assumption. Our methods include two key components: (1) a set of trigger-planted samples correctly labeled to their semantic classes (other than the target class) that can regularize backdoor learning; (2) asymmetric trigger planting strategies that help to boost attack success rate (ASR) as well as to diversify latent representations of poison samples. Extensive experiments on benchmark datasets verify the effectiveness of our adaptive attacks in bypassing existing latent separation based backdoor defenses. Moreover, our attacks still maintain a high attack success rate with negligible clean accuracy drop. Our studies call for defense designers to take caution when leveraging latent separation as an assumption in their defenses. Our codes are available at https://github.com/Unispac/Circumventing-Backdoor-Defenses.

Details

NeurIPS Conference 2023 Conference Paper

Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots

Ruixiang (Ryan) Tang
Jiayi Yuan
Yiming Li
Zirui Liu
Rui Chen
Xia Hu

In the field of natural language processing, the prevalent approach involves fine-tuning pretrained language models (PLMs) using local samples. Recent research has exposed the susceptibility of PLMs to backdoor attacks, wherein the adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate an \emph{honeypot module} into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10\% to 40\% when compared to prior state-of-the-art methods.

PDF Details

TCS Journal 2023 Journal Article

Simulatable verifiable random function from the LWE assumption

Yiming Li
Shengli Liu
Shuai Han
Dawu Gu
Jian Weng

A verifiable random function (VRF) is a pseudorandom function F that can be publicly verified. A simulatable VRF (sVRF) is an important variant of a VRF, which additionally provides simulatability. Informally, the simulatability of a VRF depicts the ability to simulate a valid proof π that y = F ( s k, x ) for any input x and any output value y. A (simulatable) VRF can be used in the E-Cash, E-Lottery, blockchain and constructing the multi-theorem non-interactive zero-knowledge (NIZK) proof. However, up to now, the existing constructions of an sVRF either rely on non-standard assumptions (e. g. , the Q-type ones), or are built in the random oracle model, or resort to time-consuming techniques like the Cook-Levin reduction. In this paper, we design the first sVRF from the LWE assumption in the standard model (free of a random oracle) without using a Cook-Levin reduction. In our construction of an sVRF, we take as building blocks a pseudorandom function, a trapdoor fully homomorphic commitment (FHC) scheme, and a NIZK proof system for a language specified by FHC. Our trapdoor FHC is the key technical tool, which helps the simplification of the underlying NIZK language, thus making possible an instantiation of a NIZK proof from LWE without a Cook-Levin reduction. Together with an LWE-based PRF, we obtain an sVRF scheme from LWE.

Details DOI

AAAI Conference 2022 Conference Paper

Defending against Model Stealing via Verifying Embedded External Features

Yiming Li
Linghui Zhu
Xiaojun Jia
Yong Jiang
Shu-Tao Xia
Xiaochun Cao

Obtaining a well-trained model involves expensive data collection and training procedures, therefore the model is a valuable intellectual property. Recent studies revealed that adversaries can ‘steal’ deployed models even when they have no training samples and can not get access to the model parameters or structures. Currently, there were some defense methods to alleviate this threat, mostly by increasing the cost of model stealing. In this paper, we explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by tempering a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. We examine our method on both CIFAR-10 and ImageNet datasets. Experimental results demonstrate that our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process. The codes for reproducing main results are available at Github (https: //github. com/zlh-thu/StealingVerification).

PDF Details

ICRA Conference 2022 Conference Paper

HGC-Net: Deep Anthropomorphic Hand Grasping in Clutter

Yiming Li
Wei Wei 0062
Daheng Li
Peng Wang 0024
Wanyi Li 0002
Jun Zhong

Grasping in cluttered environments is one of the most fundamental skills in robotic manipulation. Most of the current works focus on estimating grasp poses for parallel-jaw or suction-cup end effectors. However, the study for dexterous anthropomorphic hand grasping in clutter remains a great challenge. In this paper, we propose HGC-Net, a single-shot network that learns to predict dense hand grasp configurations in clutter from single-view point cloud input. Our end-to-end neural network can predict hand grasp proposals efficiently and effectively. To enhance generalization, we built a large-scale synthetic grasping dataset with 179 household objects, 5K cluttered scenes and over 10M hand annotations. Experiments in simulation show that our model can predict dense and robust hand grasps and clear over 78% of unseen objects in clutter without any post-processing and outperform baseline methods by a large margin. Experiments on the real robot platform also demonstrate that the model trained on synthetic data performs well in natural environments. Code is available at https://github.com/yimingli1998/hgc_net.

Details

NeurIPS Conference 2022 Conference Paper

Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection

Yiming Li
Yang Bai
Yong Jiang
Yong Yang
Shu-Tao Xia
Bo Li

Deep neural networks (DNNs) have demonstrated their superiority in practice. Arguably, the rapid development of DNNs is largely benefited from high-quality (open-sourced) datasets, based on which researchers and developers can easily evaluate and improve their learning methods. Since the data collection is usually time-consuming or even expensive, how to protect their copyrights is of great significance and worth further exploration. In this paper, we revisit dataset ownership verification. We find that existing verification methods introduced new security risks in DNNs trained on the protected dataset, due to the targeted nature of poison-only backdoor watermarks. To alleviate this problem, in this work, we explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification. Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses.

PDF Details

NeurIPS Conference 2021 Conference Paper

Learning Distilled Collaboration Graph for Multi-Agent Perception

Yiming Li
Shunli Ren
Pengxiang Wu
Siheng Chen
Chen Feng
Wenjun Zhang

To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1. 0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https: //github. com/ai4ce/DiscoNet.

PDF Details

EAAI Journal 2021 Journal Article

Learning dynamic regression with automatic distractor repression for real-time UAV tracking

Changhong Fu
Fangqiang Ding
Yiming Li
Jin Jin
Chen Feng

With high efficiency and efficacy, the trackers based on the discriminative correlation filter have experienced rapid development in the field of unmanned aerial vehicle (UAV) over the past decade. In literature, these trackers aim at solving a regression problem in which the circulated samples are mapped into a Gaussian label for online filter training. However, the fixed target label for regression makes trackers lose adaptivity in uncertain tracking scenarios. One of the typical failure cases is that the distractors, e. g. , background clutter, camouflage, and similar object, are prone to confuse these trackers. In this work, an efficient approach to instantly monitor the local maximums of the response map for discovering distractors automatically is proposed. In addition, the regression target is accordingly learned, i. e. , the location possessing local maximum indicates latent distractor and thus should be repressed by reducing its target response value in filter training. Qualitative and quantitative experiments performed on three challenging well-known benchmarks demonstrate that the presented method not only outperforms the state-of-the-art handcrafted feature-based trackers but also exhibits comparable performance compared to deep learning-based approaches. Specifically, the presented tracker has phenomenal practicability in real-time UAV applications with an average speed of ∼ 50 frames per second on an affordable CPU.

Details DOI

IROS Conference 2021 Conference Paper

Simultaneous Semantic and Collision Learning for 6-DoF Grasp Pose Estimation

Yiming Li
Tao Kong
Ruihang Chu
Yifeng Li
Peng Wang 0024
Lei Li 0005

Grasping in cluttered scenes has always been a great challenge for robots, due to the requirement of the ability to well understand the scene and object information. Previous works usually assume that the geometry information of the objects is available, or utilize a step-wise, multi-stage strategy to predict the feasible 6-DoF grasp poses. In this work, we propose to formalize the 6-DoF grasp pose estimation as a simultaneous multi-task learning problem. In a unified framework, we jointly predict the feasible 6-DoF grasp poses, instance semantic segmentation, and collision information. The whole framework is jointly optimized and end-to-end differentiable. Our model is evaluated on large-scale benchmarks as well as the real robot system. On the public dataset, our method outperforms prior state-of-the-art methods by a large margin (+4. 08 AP). We also demonstrate the implementation of our model on a real robotic platform and show that the robot can accurately grasp target objects in cluttered scenarios with a high success rate. Project link: https://openbyterobotics.github.io/sscl.

Details

YNICL Journal 2019 Journal Article

A quantitative SVM approach potentially improves the accuracy of magnetic resonance spectroscopy in the preoperative evaluation of the grades of diffuse gliomas

Chong Qi
Yiming Li
Xing Fan
Yin Jiang
Rui Wang
Song Yang
Lanxi Meng
Tao Jiang

OBJECTIVES: H-MRS) metabolic features and the grade of gliomas, and to establish a machine-learning model to predict the glioma grade. METHODS: H-MRS image. The Student's t-test was conducted to screen for differentially expressed features between low- and high-grade gliomas (WHO grades II and III/IV, respectively). Next, the minimum Redundancy Maximum Relevance (mRMR) algorithm was performed to further select features for a support vector machine (SVM) classifier building. Performance of the predictive model was evaluated both in the training and validation sets using ROC curve analysis. RESULTS: H-MRS metabolic features, thirteen features were differentially expressed. Four features were further selected as grade-predictive imaging signatures using the mRMR algorithm. The predictive performance of the machine-learning model measured by the AUC was 0.825 and 0.820 in the training and validation sets, respectively. This was better than the predictive performances of individual metabolic features, the best of which was 0.812. CONCLUSIONS: H-MRS metabolic features could help in predicting the grade of gliomas. The machine-learning model achieved a better prediction performance in grading gliomas than individual features, indicating that it could complement the traditionally used metabolic features.

Details DOI

YNICL Journal 2018 Journal Article

A radiomic signature as a non-invasive predictor of progression-free survival in patients with lower-grade gliomas

Xing Liu
Yiming Li
Zenghui Qian
Zhiyan Sun
Kaibin Xu
Kai Wang
Shuai Liu
Xing Fan

OBJECTIVE: The aim of this study was to develop a radiomics signature for prediction of progression-free survival (PFS) in lower-grade gliomas and to investigate the genetic background behind the radiomics signature. METHODS: In this retrospective study, training (n = 216) and validation (n = 84) cohorts were collected from the Chinese Glioma Genome Atlas and the Cancer Genome Atlas, respectively. For each patient, a total of 431 radiomics features were extracted from preoperative T2-weighted magnetic resonance images. A radiomics signature was generated in the training cohort, and its prognostic value was evaluated in both the training and validation cohorts. The genetic characteristics of the group with high-risk scores were identified by radiogenomic analysis, and a nomogram was established for prediction of PFS. RESULTS: There was a significant association between the radiomics signature (including 9 screened radiomics features) and PFS, which was independent of other clinicopathologic factors in both the training (P < 0.001, multivariable Cox regression) and validation (P = 0.045, multivariable Cox regression) cohorts. Radiogenomic analysis revealed that the radiomics signature was associated with the immune response, programmed cell death, cell proliferation, and vasculature development. A nomogram established using the radiomics signature and clinicopathologic risk factors demonstrated high accuracy and good calibration for prediction of PFS in both the training (C-index, 0.684) and validation (C-index, 0.823) cohorts. CONCLUSIONS: PFS can be predicted non-invasively in patients with LGGs by a group of radiomics features that could reflect the biological processes of these tumors.

Details DOI

YNICL Journal 2018 Journal Article

MRI features predict p53 status in lower-grade gliomas via a machine-learning approach

Yiming Li
Zenghui Qian
Kaibin Xu
Kai Wang
Xing Fan
Shaowu Li
Tao Jiang
Xing Liu

Background: P53 mutation status is a pivotal biomarker for gliomas. Here, we developed a machine-learning model to predict p53 status in lower-grade gliomas based on radiomic features extracted from conventional magnetic resonance (MR) images. Methods: = 92) set. A total of 431 radiomic features were extracted from each patient. The lest absolute shrinkage and selection operator (LASSO) method was used for feature selection and radiomic signature construction. Subsequently, a machine-learning model to predict p53 status was established using the selected features and a Support Vector Machine classifier. The predictive performance of all individual features and the model was calculated using receiver operating characteristic curves in both the training and validation sets. Results: The p53-related radiomic signature was built using the LASSO algorithm; this procedure consisted of four first-order statistics or related wavelet features (including Maximum, Median, Minimum, and Uniformity), a shape and size-based feature (Spherical Disproportion), and ten textural features or related wavelet features (including Correlation, Run Percentage, and Sum Entropy). The prediction accuracies based on the area under the curve were 89.6% in the training set and 76.3% in the validation set, which were better than individual features. Conclusions: These results demonstrate that MR image texture features are predictive of p53 mutation status in lower-grade gliomas. Thus, our procedure can be conveniently used to facilitate presurgical molecular pathological diagnosis.

Details DOI