Arrow Research search

Author name cluster

Teng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
1 author row

Possible papers

13

AAAI Conference 2026 Conference Paper

Bidirectional Counterfactual Distillation for Review-Based Recommendation

  • Sheng Sang
  • Shujie Li
  • Shuaiyang Li
  • Kang Liu
  • Teng Li
  • Wei Jia
  • Dan Guo
  • Feng Xue

Review-based recommendation methods typically integrate multiple behaviors, including interactions, reviews, and ratings, to model user preferences. To effectively extract preference signals from diverse behaviors, some studies train multiple student models to capture distinct behavioral patterns, and leverage online distillation to facilitate collaborative learning among them. However, we argue that these techniques suffer from bias contamination from rating distributions and feature homogenization during cross-behavior knowledge transfer: (1) Rating distribution bias, arising from non-uniform historical ratings, propagates across behaviors through distillation, contaminating the true preference representations of other behaviors. (2) Static distillation strategies often lead to homogenized behavioral features, hindering the learning of behavior-specific preferences. To address these issues, we propose a novel Bidirectional Counterfactual Distillation (BiCoD) framework for review-based recommendation. In BiCoD, we first design an adversarial counterfactual distillation module to suppress the impact of non-uniform rating distributions on distillation, thereby preventing it from contaminating the user's true preference representations across behaviors. Subsequently, we introduce a stage-aware bidirectional distillation strategy to enhance the distinctiveness of behavioral features, facilitating the effective learning of behavior-specific preferences. Extensive experiments on five real-world datasets validate the effectiveness and superiority of the proposed framework.

JBHI Journal 2026 Journal Article

JS-RegNeXt: A ConvNeXt-based few-shot JSR framework with correlation awareness and multi-scale prediction consistency

  • Teng Li
  • Runing Xiao
  • Tongtong Xie
  • Wei Huang
  • Jialong Hou
  • Yuchuan Qiao
  • Changyan Xiao

Conventional label-constrained (LC) medical image registration methods are extremely dependent on the number of labels, resulting in the overfitting problem when the number of labels is insufficient. Recently, joint segmentation and registration (JSR) methods have demonstrated promising results for the LC registration tasks in few-shot situations. However, these methods typically lack global correlation awareness of the images to be registered and cannot robustly perceive global semantic information, leading to suboptimal registration performance on anatomy with low contrast or blurred boundaries. Therefore, we propose a novel JS-RegNeXt framework for few-shot label-constrained registration for medical images, which consists of segmentation and registration modules. Specifically, the segmentation module perceives global semantic information, and the registration module generates synthetic labeled data to fine-tune it. For the segmentation module, a SegNet with multi-scale prediction consistency is designed to mitigate the uncertainty introduced by synthesized data and to improve the robustness of semantic perception, even in low-contrast regions. For the registration module, a RegNeXt is proposed to achieve correlation awareness between images and leverages the large receptive field of ConvNeXt to enhance global perception. This design improves robustness in low-contrast regions, leading to more accurate and reliable image registration. Experiments on two public 3D medical image datasets, cardiac CT and brain MRI, show that our JS-RegNeXt achieves improvements in both segmentation and registration tasks compared to many state-of-the-art methods. It demonstrates that our JS-RegNeXt framework has great potential for clinical application.

AAAI Conference 2026 Conference Paper

UniScene-MoTion: Unified Scene & Motion-aware Diffusion Transition Framework

  • Rui Jiang
  • Chongmian Wang
  • Xinghe Fu
  • Yehao Lu
  • Teng Li
  • Xi Li

Video transitions are critical for ensuring temporal coherence in edited media, yet existing methods often rely on handcrafted effects or relative-scale trajectories that fail to capture the physical structure of real-world scenes. In this work, we introduce a scale-aware video transition framework that explicitly incorporates depth-aware 3D reasoning into a diffusion-based generation pipeline. Built upon a powerful I2V foundation, our method leverages single-image depth prediction to align camera motion with metric-scale geometry, enabling physically consistent transitions. To reduce reliance on precise camera inputs, we propose a bidirectional conditional control module and a progressive training strategy with conditional dropout, enhancing generalization to loosely specified or missing camera trajectories. Extensive experiments demonstrate that our approach achieves state-of-the-art performance, delivering realistic, geometrically coherent transitions across diverse scenes and applications with minimal input guidance.

AAAI Conference 2025 Conference Paper

AIM: Additional Image Guided Generation of Transferable Adversarial Attacks

  • Teng Li
  • Xingjun Ma
  • Yu-Gang Jiang

Transferable adversarial examples highlight the vulnerability of deep neural networks (DNNs) to imperceptible perturbations across various real-world applications. While there have been notable advancements in untargeted transferable attacks, targeted transferable attacks remain a significant challenge. In this work, we focus on generative approaches for targeted transferable attacks. Current generative attacks focus on reducing overfitting to surrogate models and the source data domain, but they often overlook the importance of enhancing transferability through additional semantics. To address this issue, we introduce a novel plug-and-play module into the general generator architecture to enhance adversarial transferability. Specifically, we propose a Semantic Injection Module (SIM) that utilizes the semantics contained in an additional guiding image to improve transferability. The guiding image provides a simple yet effective method to incorporate target semantics from the target class to create targeted and highly transferable attacks. Additionally, we propose new loss formulations that can integrate the semantic injection module more effectively for both targeted and untargeted attacks. We conduct comprehensive experiments under both targeted and untargeted attack settings to demonstrate the efficacy of our proposed approach.

AAAI Conference 2025 Conference Paper

Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models

  • Rui Jiang
  • Xinghe Fu
  • Guangcong Zheng
  • Teng Li
  • Taiping Yao
  • Xi Li

The rapid advancement of pretrained text-driven diffusion models has significantly enriched applications in image generation and editing. However, as the demand for personalized content editing increases, new challenges emerge especially when dealing with arbitrary objects and complex scenes. Existing methods usually mistakes mask as the object shape prior, which struggle to achieve a seamless integration result. The mostly used inversion noise initialization also hinders the identity consistency towards the target object. To address these challenges, we propose a novel training-free framework that formulates personalized content editing as the optimization of edited images in the latent space, using diffusion models as the energy function guidance conditioned by reference text-image pairs. A coarse-to-fine strategy is proposed that employs text energy guidance at the early stage to achieve a natural transition toward the target class and uses point-to-point feature-level image energy guidance to perform fine-grained appearance alignment with the target object. Additionally, we introduce the latent space content composition to enhance overall identity consistency with the target. Extensive experiments demonstrate that our method excels in object replacement even with a large domain gap, highlighting its potential for high-quality, personalized image editing.

NeurIPS Conference 2024 Conference Paper

AdaPKC: PeakConv with Adaptive Peak Receptive Field for Radar Semantic Segmentation

  • Teng Li
  • Liwen Zhang
  • Youcheng Zhang
  • Zijun Hu
  • Pengcheng Pi
  • Zongqing Lu
  • Qingmin Liao
  • Zhe Ma

Deep learning-based radar detection technology is receiving increasing attention in areas such as autonomous driving, UAV surveillance, and marine monitoring. Among recent efforts, PeakConv (PKC) provides a solution that can retain the peak response characteristics of radar signals and play the characteristics of deep convolution, thereby improving the effect of radar semantic segmentation (RSS). However, due to the use of a pre-set fixed peak receptive field sampling rule, PKC still has limitations in dealing with problems such as inconsistency of target frequency domain response broadening, non-homogeneous and time-varying characteristic of noise/clutter distribution. Therefore, this paper proposes an idea of adaptive peak receptive field, and upgrades PKC to AdaPKC based on this idea. Beyond that, a novel fine-tuning technology to further boost the performance of AdaPKC-based RSS networks is presented. Through experimental verification using various real-measured radar data (including publicly available low-cost millimeter-wave radar dataset for autonomous driving and self-collected Ku-band surveillance radar dataset), we found that the performance of AdaPKC-based models surpasses other SoTA methods in RSS tasks. The code is available at https: //github. com/lihua199710/AdaPKC.

NeurIPS Conference 2024 Conference Paper

TARSS-Net: Temporal-Aware Radar Semantic Segmentation Network

  • Youcheng Zhang
  • Liwen Zhang
  • Zijun Hu
  • Pengcheng Pi
  • Teng Li
  • Yuanpei Chen
  • Shi Peng
  • Zhe Ma

Radar signal interpretation plays a crucial role in remote detection and ranging. With the gradual display of the advantages of neural network technology in signal processing, learning-based radar signal interpretation is becoming a research hot-spot and made great progress. And since radar semantic segmentation (RSS) can provide more fine-grained target information, it has become a more concerned direction in this field. However, the temporal information, which is an important clue for analyzing radar data, has not been exploited sufficiently in present RSS frameworks. In this work, we propose a novel temporal information learning paradigm, i. e. , data-driven temporal information aggregation with learned target-history relations. Following this idea, a flexible learning module, called Temporal Relation-Aware Module (TRAM) is carefully designed. TRAM contains two main blocks: i) an encoder for capturing the target-history temporal relations (TH-TRE) and ii) a learnable temporal relation attentive pooling (TRAP) for aggregating temporal information. Based on TRAM, an end-to-end Temporal-Aware RSS Network (TARSS-Net) is presented, which has outstanding performance on publicly available and our collected real-measured datasets. Code and supplementary materials are available at https: //github. com/zlw9161/TARSS-Net.

AAAI Conference 2023 Conference Paper

AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

  • Xiaoyang Huang
  • Yanjun Wang
  • Yang Liu
  • Bingbing Ni
  • Wenjun Zhang
  • Jinxian Liu
  • Teng Li

Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of individuals, which is essential to produce accurate sound source positions. In this work, we address this problem from an interdisciplinary perspective. The rendering of spatial audio is strongly correlated with the 3D shape of human bodies, particularly ears. To this end, we propose to achieve personalized spatial audio by reconstructing 3D human ears with single-view images. First, to benchmark the ear reconstruction task, we introduce AudioEar3D, a high-quality 3D ear dataset consisting of 112 point cloud ear scans with RGB images. To self-supervisedly train a reconstruction model, we further collect a 2D ear dataset composed of 2,000 images, each one with manual annotation of occlusion and 55 landmarks, named AudioEar2D. To our knowledge, both datasets have the largest scale and best quality of their kinds for public use. Further, we propose AudioEarM, a reconstruction method guided by a depth estimation network that is trained on synthetic data, with two loss functions tailored for ear data. Lastly, to fill the gap between the vision and acoustics community, we develop a pipeline to integrate the reconstructed ear mesh with an off-the-shelf 3D human body and simulate a personalized Head-Related Transfer Function (HRTF), which is the core of spatial audio rendering. Code and data are publicly available in https://github.com/seanywang0408/AudioEar.

AAAI Conference 2023 Conference Paper

Boosting Point Clouds Rendering via Radiance Mapping

  • Xiaoyang Huang
  • Yi Zhang
  • Bingbing Ni
  • Teng Li
  • Kai Chen
  • Wenjun Zhang

Recent years we have witnessed rapid development in NeRF-based image rendering due to its high quality. However, point clouds rendering is somehow less explored. Compared to NeRF-based rendering which suffers from dense spatial sampling, point clouds rendering is naturally less computation intensive, which enables its deployment in mobile computing device. In this work, we focus on boosting the image quality of point clouds rendering with a compact model design. We first analyze the adaption of the volume rendering formulation on point clouds. Based on the analysis, we simplify the NeRF representation to a spatial mapping function which only requires single evaluation per pixel. Further, motivated by ray marching, we rectify the the noisy raw point clouds to the estimated intersection between rays and surfaces as queried coordinates, which could avoid spatial frequency collapse and neighbor point disturbance. Composed of rasterization, spatial mapping and the refinement stages, our method achieves the state-of-the-art performance on point clouds rendering, outperforming prior works by notable margins, with a smaller model size. We obtain a PSNR of 31.74 on NeRF-Synthetic, 25.88 on ScanNet and 30.81 on DTU. Code and data are publicly available in https://github.com/seanywang0408/RadianceMapping.

AAAI Conference 2022 Conference Paper

Bi-volution: A Static and Dynamic Coupled Filter

  • Xiwei Hu
  • Xuanhong Chen
  • Bingbing Ni
  • Teng Li
  • Yutian Liu

Dynamic convolution has achieved significant gain in performance and computational complexity, thanks to its powerful representation capability given limited filter number/layers. However, SOTA dynamic convolution operators are sensitive to input noises (e. g. , Gaussian noise, shot noise, e. t. c.) and lack sufficient spatial contextual information in filter generation. To alleviate this inherent weakness, we propose a lightweight and heterogeneous-structure (i. e. , static and dynamic) operator, named Bi-volution. On the one hand, Bivolution is designed as a dual-branch structure to fully leverage complementary properties of static/dynamic convolution, which endows Bi-volution more robust properties and higher performance. On the other hand, the Spatial Augmented Kernel Generation module is proposed to improve the dynamic convolution, realizing the learning of spatial context information with negligible additional computational complexity. Extensive experiments illustrate that the ResNet-50 equipped with Bi-volution achieves a highly competitive boost in performance (+2. 8% top-1 accuracy on ImageNet classification, +2. 4% box AP and +2. 2% mask AP on COCO detection and instance segmentation) while maintaining extremely low FLOPs (i. e. , ResNet50@2. 7 GFLOPs). Furthermore, our Bivolution shows better robustness than dynamic convolution against various noise and input corruptions. Our code is available at https: //github. com/neuralchen/Bivolution.

AAAI Conference 2020 Conference Paper

Adversarial Domain Adaptation with Domain Mixup

  • Minghao Xu
  • Jian Zhang
  • Bingbing Ni
  • Teng Li
  • Chengjie Wang
  • Qi Tian
  • Wenjun Zhang

Recent works on domain adaptation reveal the effectiveness of adversarial learning on filling the discrepancy between source and target domains. However, two common limitations exist in current adversarial-learning-based methods. First, samples from two domains alone are not sufficient to ensure domain-invariance at most part of latent space. Second, the domain discriminator involved in these methods can only judge real or fake with the guidance of hard label, while it is more reasonable to use soft scores to evaluate the generated images or features, i. e. , to fully utilize the inter-domain information. In this paper, we present adversarial domain adaptation with domain mixup (DM-ADA), which guarantees domain-invariance in a more continuous latent space and guides the domain discriminator in judging samples’ difference relative to source and target domains. Domain mixup is jointly conducted on pixel and feature level to improve the robustness of models. Extensive experiments prove that the proposed approach can achieve superior performance on tasks with various degrees of domain shift and data complexity.

TIST Journal 2016 Journal Article

Multitask Low-Rank Affinity Graph for Image Segmentation and Image Annotation

  • Teng Li
  • Bin Cheng
  • Bingbing Ni
  • Guangchan Liu
  • Shuicheng Yan

This article investigates a low-rank representation--based graph, which can used in graph-based vision tasks including image segmentation and image annotation. It naturally fuses multiple types of image features in a framework named multitask low-rank affinity pursuit. Given the image patches described with multiple types of features, we aim at inferring a unified affinity matrix that implicitly encodes the relations among these patches. This is achieved by seeking the sparsity-consistent low-rank affinities from the joint decompositions of multiple feature matrices into pairs of sparse and low-rank matrices, the latter of which is expressed as the production of the image feature matrix and its corresponding image affinity matrix. The inference process is formulated as a minimization problem and solved efficiently with the augmented Lagrange multiplier method. Considering image patches as vertices, a graph can be built based on the resulted affinity matrix. Compared to previous methods, which are usually based on a single type of feature, the proposed method seamlessly integrates multiple types of features to jointly produce the affinity matrix in a single inference step. The proposed method is applied to graph-based image segmentation and graph-based image annotation. Experiments on benchmark datasets well validate the superiority of using multiple features over single feature and also the superiority of our method over conventional methods for feature fusion.