Author name cluster

Jianfei Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers

2 author rows

AAAI Conference 2026 Conference Paper

Mask2IV: Interaction-Centric Video Generation via Mask Trajectories

Gen Li
Bo Zhao
Jianfei Yang
Laura Sevilla-Lara

Generating interaction-centric videos, such as those depicting humans or robots interacting with objects, is crucial for embodied intelligence, as they provide rich and diverse visual priors for robot learning, manipulation policy training, and affordance reasoning. However, existing methods often struggle to model such complex and dynamic interactions. While recent studies show that masks can serve as effective control signals and enhance generation quality, obtaining dense and precise mask annotations remains a major challenge for real-world use. To overcome this limitation, we introduce Mask2IV, a novel framework specifically designed for interaction-centric video generation. It adopts a decoupled two-stage pipeline that first predicts plausible motion trajectories for both actor and object, then generates a video conditioned on these trajectories. This design eliminates the need for dense mask inputs from users while preserving the flexibility to manipulate the interaction process. Furthermore, Mask2IV supports versatile and intuitive control, allowing users to specify the target object of interaction and guide the motion trajectory through action descriptions or spatial position cues. To support systematic training and evaluation, we curate two benchmarks covering diverse action and object categories across both human-object interaction and robotic manipulation scenarios. Extensive experiments demonstrate that our method achieves superior visual realism and controllability compared to existing baselines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

mmPred: Radar-based Human Motion Prediction in the Dark

Junqiao Fan
Haocong Rao
Jiarui Zhang
Jianfei Yang
Lihua Xie

Existing Human Motion Prediction (HMP) methods based on RGB(D) cameras are sensitive to lighting conditions and raise privacy concerns, limiting their real-world applications such as firefighting and elderly care. Motivated by the robustness and privacy-preserving nature of millimeter-wave (mmWave) radar, this work introduces radar as a novel sensing modality for HMP for the first time. Nevertheless, radar signals often suffer from specular reflections and multipath effects, resulting in noisy and temporally inconsistent measurements, such as body-part miss-detection. To address these radar-specific artifacts, we propose mmPred, the first diffusion-based framework tailored for radar-based HMP. mmPred introduces a dual-domain historical motion representation to guide the generation process, combining a Time-domain Pose Refinement (TPR) branch for fine-grained details and a Frequency-domain Dominant Motion (FDM) branch for capturing global motion trends and suppressing frame-level inconsistency. Furthermore, we design a Global Skeleton-relational Transformer (GST) as the diffusion backbone to model global inter-joint cooperation, enabling corrupted joints to dynamically aggregate information from others. Extensive experiments show that mmPred achieves state-of-the-art performance, outperforming existing methods by 8.6% on mmBody and 22% on mm-Fi.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Zero-Shot Open-Vocabulary Human Motion Grounding with Test-Time Training

Yunjiao Zhou
Xinyan Chen
Junlang Qian
Lihua Xie
Jianfei Yang

Understanding complex human activities demands the ability to decompose motion into fine-grained, semantic-aligned sub-actions. This motion grounding process is crucial for behavior analysis, embodied AI and virtual reality. Yet, most existing methods rely on dense supervision with predefined action classes, which are infeasible in open-vocabulary, real-world settings. In this paper, we propose ZOMG, a zero-shot, open-vocabulary framework that segments motion sequences into semantically meaningful sub-actions without requiring any annotations or fine-tuning. Technically, ZOMG integrates (1) language semantic partition, which leverages large language models to decompose instructions into ordered sub-action units, and (2) soft masking optimization, which learns instance-specific temporal masks to focus on frames critical to sub-actions, while maintaining intra-segment continuity and enforcing inter-segment separation, all without altering the pretrained encoder. Experiments on three motion-language datasets demonstrate state-of-the-art effectiveness and efficiency of motion grounding performance, outperforming prior methods by 8.7% mAP on HumanML3D benchmark. Meanwhile, significant improvements also exist in downstream retrieval, establishing a new paradigm for annotation-free motion understanding.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Benford’s Curse: Tracing Digit Bias to Numerical Hallucination in LLMs

Jiandong Shao
Yao Lu
Jianfei Yang

Large Language Models (LLMs) exhibit impressive performance on complex reasoning tasks, yet they frequently fail on basic numerical problems, producing incorrect outputs. Inspired by Benford’s Law, a statistical pattern in which lower digits occur more frequently as leading digits, we hypothesize that the skewed digit distributions in web-collected corpora may be learned by LLMs during pretraining, leading to biased numerical generation. To investigate the hypothesis, we first examine whether digits frequencies in pretraining corpus (OLMo2) follows Benford's law. We then construct an evaluation benchmark in which the ground-truth digits are uniformly distributed within each of the seven numerical reasoning tasks. Our evaluation results demonstrate that leading open-source LLMs show a consistent pattern of digit bias that resembles Benford's law. Through logit-lens tracing and neuron-level dissection, we identify that this bias arises predominantly from a small subset of highly digit-selective feed-forward network (FFN) neurons in the deeper layers. Finally, we demonstrate that pruning these neurons mitigates imbalanced overgeneration and partially corrects erroneous outputs, providing causal evidence that fine-grained pretraining digit bias can propagate into model behavior. Our findings reveal a fundamental connection between corpus-level statistics and symbolic failure modes in LLMs, offering a new lens for diagnosing and mitigating hallucinations in numerical tasks.

PDF Details

IROS Conference 2025 Conference Paper

CGS-SLAM: Compact 3D Gaussian Splatting for Dense Visual SLAM

Tianchen Deng
Yaohui Chen 0003
Jianfei Yang
Shenghai Yuan 0001
Jiuming Liu
Danwei Wang
Weidong Chen 0001

Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs and slow training speed. To address this limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then, a novel geometry codebook-based quantization method is proposed to further compress 3D Gaussian geometric attributes. Robust and accurate pose estimation is achieved by a local-to-global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training, rendering speed, and low memory usage while maintaining the state-of-the-art (SOTA) quality of the scene representation.

Details

ICLR Conference 2025 Conference Paper

Feedback Favors the Generalization of Neural ODEs

Jindou Jia
Zihan Yang
Meng Wang 0044
Kexin Guo
Jianfei Yang
Xiang Yu 0003
Lei Guo 0003

The well-known generalization problem hinders the application of artificial neural networks in continuous-time prediction tasks with varying latent dynamics. In sharp contrast, biological systems can neatly adapt to evolving environments benefiting from real-time feedback mechanisms. Inspired by the feedback philosophy, we present feedback neural networks, showing that a feedback loop can flexibly correct the learned latent dynamics of neural ordinary differential equations (neural ODEs), leading to a prominent generalization improvement. The feedback neural network is a novel two-DOF neural network, which possesses robust performance in unseen scenarios with no loss of accuracy performance on previous tasks. A linear feedback form is presented to correct the learned latent dynamics firstly, with a convergence guarantee. Then, domain randomization is utilized to learn a nonlinear neural feedback form. Finally, extensive tests including trajectory prediction of a real irregular object and model predictive control of a quadrotor with various uncertainties, are implemented, indicating significant improvements over state-of-the-art model-based and learning-based methods.

Details

ICRA Conference 2025 Conference Paper

GERA: Geometric Embedding for Efficient Point Registration Analysis

Geng Li
Haozhi Cao
Mingyang Liu
Shenghai Yuan 0001
Jianfei Yang

Point cloud registration aims to provide estimated transformations to align point clouds, which plays a crucial role in pose estimation of various navigation systems, such as surgical guidance systems and autonomous vehicles. Despite the impressive performance of recent models on benchmark datasets, many rely on complex modules like KPConv and Transformers, which impose significant computational and memory demands. These requirements hinder their practical application, particularly in resource-constrained environments such as mobile robotics. In this paper, we propose a novel point cloud registration network that leverages a pure MLP architecture, constructing geometric information offline. This approach eliminates the computational and memory burdens associated with traditional complex feature extractors and significantly reduces inference time and resource consumption. Our method is the first to replace 3D coordinate inputs with offline-constructed geometric encoding, improving generalization and stability, as demonstrated by Maximum Mean Discrepancy (MMD) comparisons. This efficient and accurate geometric representation marks a significant advancement in point cloud analysis, particularly for applications requiring fast and reliability.

Details

NeurIPS Conference 2025 Conference Paper

HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning

Chuhao Zhou
Jianfei Yang

Embodied agents operating in smart homes must understand human behavior through diverse sensory inputs and communicate via natural language. While Vision-Language Models (VLMs) have enabled impressive language-grounded perception, their reliance on visual data limits robustness in real-world scenarios with occlusions, poor lighting, or privacy constraints. In this paper, we introduce HoloLLM, a Multimodal Large Language Model (MLLM) that integrates uncommon but powerful sensing modalities, such as LiDAR, infrared, mmWave radar, and WiFi, to enable seamless human perception and reasoning across heterogeneous environments. We address two key challenges: (1) the scarcity of aligned modality-text data for rare sensors, and (2) the heterogeneity of their physical signal representations. To overcome these, we design a Universal Modality-Injection Projector (UMIP) that enhances pre-aligned modality embeddings with fine-grained, text-aligned features from tailored encoders via coarse-to-fine cross-attention without introducing significant alignment overhead. We further introduce a human-VLM collaborative data curation pipeline to generate paired textual annotations for sensing datasets. Extensive experiments on two newly constructed benchmarks show that HoloLLM significantly outperforms existing MLLMs, improving language-grounded human sensing accuracy by up to 30\%. This work establishes a new foundation for real-world, language-informed multisensory embodied intelligence.

PDF Details

NeurIPS Conference 2025 Conference Paper

Intend to Move: A Multimodal Dataset for Intention-Aware Human Motion Understanding

Ryo Umagami
Liu Yue
Xuangeng Chu
Ryuto Fukushima
Tetsuya Narita
Yusuke Mukuta
Tomoyuki Takahata
Jianfei Yang

Human motion is inherently intentional, yet most motion modeling paradigms focus on low-level kinematics, overlooking the semantic and causal factors that drive behavior. Existing datasets further limit progress: they capture short, decontextualized actions in static scenes, providing little grounding for embodied reasoning. To address these limitations, we introduce $\textit{Intend to Move (I2M)}$, a large-scale, multimodal dataset for intention-grounded motion modeling. I2M contains 10. 1 hours of two-person 3D motion sequences recorded in dynamic realistic home environments, accompanied by multi-view RGB-D video, 3D scene geometry, and language annotations of each participant’s evolving intentions. Benchmark experiments reveal a fundamental gap in current motion models: they fail to translate high-level goals into physically and socially coherent motion. I2M thus serves not only as a dataset but as a benchmark for embodied intelligence, enabling research on models that can reason about, predict, and act upon the ``why'' behind human motion.

PDF Details

IROS Conference 2025 Conference Paper

QLIO: Quantized LiDAR-Inertial Odometry

Boyang Lou
Shenghai Yuan 0001
Jianfei Yang
Wenju Su
Yingjian Zhang
Enwen Hu

LiDAR-Inertial Odometry (LIO) is widely used for autonomous navigation, but its deployment on Size, Weight, and Power (SWaP)-constrained platforms remains challenging due to the computational cost of processing dense point clouds. Conventional LIO frameworks rely on a single onboard processor, leading to computational bottlenecks and high memory demands, making real-time execution difficult on embedded systems. To address this, we propose QLIO, a multi-processor distributed quantized LIO framework that reduces computational load and bandwidth consumption while maintaining localization accuracy. QLIO introduces a quantized state estimation pipeline, where a co-processor pre-processes LiDAR measurements, compressing point-to-plane residuals before transmitting only essential features to the host processor. Additionally, an rQ-vector-based adaptive resampling strategy intelligently selects and compresses key observations, further reducing computational redundancy. Real-World evaluations demonstrate that QLIO achieves a 14. 1× reduction in perobservation residual data while preserving localization accuracy. Furthermore, we release an open-source implementation to facilitate further research and real-world deployment. These results establish QLIO as an efficient and scalable solution for real-time autonomous systems operating under computational and bandwidth constraints.

Details

ICLR Conference 2025 Conference Paper

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen 0002
Jianfei Yang

Human sensing, which employs various sensors and advanced deep learning technologies to accurately capture and interpret human body information, has significantly impacted fields like public security and robotics. However, current human sensing primarily depends on modalities such as cameras and LiDAR, each of which has its own strengths and limitations. Furthermore, existing multimodal fusion solutions are typically designed for fixed modality combinations, requiring extensive retraining when modalities are added or removed for diverse scenarios. In this paper, we propose a modality-invariant foundation model for all modalities, X-Fi, to address these issues. X-Fi enables the independent or combinatory use of sensor modalities without additional training by utilizing a transformer structure to accommodate variable input sizes and incorporating a novel "X-fusion" mechanism to preserve modality-specific features during multimodal integration. This approach not only enhances adaptability but also facilitates the learning of complementary features across modalities. Extensive experiments conducted on the MM-Fi and XRF55 datasets, employing six distinct modalities, demonstrate that X-Fi achieves state-of-the-art performance in human pose estimation (HPE) and human activity recognition (HAR) tasks. The findings indicate that our proposed model can efficiently support a wide range of human sensing applications, ultimately contributing to the evolution of scalable, multimodal sensing technologies.

Details

ICLR Conference 2024 Conference Paper

Can We Evaluate Domain Adaptation Models Without Target-Domain Labels?

Jianfei Yang
Hanjie Qian
Yuecong Xu
Kai Wang 0036
Lihua Xie 0001

Unsupervised domain adaptation (UDA) involves adapting a model trained on a label-rich source domain to an unlabeled target domain. However, in real-world scenarios, the absence of target-domain labels makes it challenging to evaluate the performance of UDA models. Furthermore, prevailing UDA methods relying on adversarial training and self-training could lead to model degeneration and negative transfer, further exacerbating the evaluation problem. In this paper, we propose a novel metric called the Transfer Score to address these issues. The proposed metric enables the unsupervised evaluation of UDA models by assessing the spatial uniformity of the classifier via model parameters, as well as the transferability and discriminability of deep representations. Based on the metric, we achieve three novel objectives without target-domain labels: (1) selecting the best UDA method from a range of available options, (2) optimizing hyperparameters of UDA models to prevent model degeneration, and (3) identifying which checkpoint of UDA model performs optimally. Our work bridges the gap between data-level UDA research and practical UDA scenarios, enabling a realistic assessment of UDA model performance. We validate the effectiveness of our metric through extensive empirical studies on UDA datasets of different scales and imbalanced distributions. The results demonstrate that our metric robustly achieves the aforementioned goals.

Details

AAAI Conference 2024 Conference Paper

Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data

Yucheng Wang
Yuecong Xu
Jianfei Yang
Min Wu
Xiaoli Li
Lihua Xie
Zhenghua Chen

Multivariate Time-Series (MTS) data is crucial in various application fields. With its sequential and multi-source (multiple sensors) properties, MTS data inherently exhibits Spatial-Temporal (ST) dependencies, involving temporal correlations between timestamps and spatial correlations between sensors in each timestamp. To effectively leverage this information, Graph Neural Network-based methods (GNNs) have been widely adopted. However, existing approaches separately capture spatial dependency and temporal dependency and fail to capture the correlations between Different sEnsors at Different Timestamps (DEDT). Overlooking such correlations hinders the comprehensive modelling of ST dependencies within MTS data, thus restricting existing GNNs from learning effective representations. To address this limitation, we propose a novel method called Fully-Connected Spatial-Temporal Graph Neural Network (FC-STGNN), including two key components namely FC graph construction and FC graph convolution. For graph construction, we design a decay graph to connect sensors across all timestamps based on their temporal distances, enabling us to fully model the ST dependencies by considering the correlations between DEDT. Further, we devise FC graph convolution with a moving-pooling GNN layer to effectively capture the ST dependencies for learning effective representations. Extensive experiments show the effectiveness of FC-STGNN on multiple MTS datasets compared to SOTA methods. The code is available at https://github.com/Frank-Wang-oss/FCSTGNN.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Graph-Aware Contrasting for Multivariate Time-Series Classification

Yucheng Wang
Yuecong Xu
Jianfei Yang
Min Wu
Xiaoli Li
Lihua Xie
Zhenghua Chen

Contrastive learning, as a self-supervised learning paradigm, becomes popular for Multivariate Time-Series (MTS) classification. It ensures the consistency across different views of unlabeled samples and then learns effective representations for these samples. Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques, aiming to preserve temporal patterns against perturbations for MTS data. However, they overlook spatial consistency that requires the stability of individual sensors and their correlations. As MTS data typically originate from multiple sensors, ensuring spatial consistency becomes essential for the overall performance of contrastive learning on MTS data. Thus, we propose Graph-Aware Contrasting for spatial consistency across MTS data. Specifically, we propose graph augmentations including node and edge augmentations to preserve the stability of sensors and their correlations, followed by graph contrasting with both node- and graph-level contrasting to extract robust sensor- and global-level features. We further introduce multi-window temporal contrasting to ensure temporal consistency in the data for each sensor. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on various MTS classification tasks. The code is available at https://github.com/Frank-Wang-oss/TS-GAC.

PDF Details DOI

TMLR Journal 2024 Journal Article

Leveraging Endo- and Exo-Temporal Regularization for Black-box Video Domain Adaptation

Yuecong Xu
Jianfei Yang
Haozhi Cao
Min Wu
Xiaoli Li
Lihua Xie
Zhenghua Chen

To enable video models to be applied seamlessly across video tasks in different environments, various Video Unsupervised Domain Adaptation (VUDA) methods have been proposed to improve the robustness and transferability of video models. Despite improvements made in model robustness, these VUDA methods require access to both source data and source model parameters for adaptation, raising serious data privacy and model portability issues. To cope with the above concerns, this paper firstly formulates Black-box Video Domain Adaptation (BVDA) as a more realistic yet challenging scenario where the source video model is provided only as a black-box predictor. While a few methods for Black-box Domain Adaptation (BDA) are proposed in the image domain, these methods cannot apply to the video domain since video modality has more complicated temporal features that are harder to align. To address BVDA, we propose a novel Endo and eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies and video-tailored regularizations. They are the endo-temporal regularization and exo-temporal regularization, which are performed across both clip and temporal features, while distilling knowledge from the predictions obtained from the black-box predictor. Empirical results demonstrate the state-of-the-art performance of EXTERN across various cross-domain closed-set and partial-set action recognition benchmarks, which even surpasses most existing video domain adaptation methods with source data accessibility. Code will be available at https://xuyu0010.github.io/b2vda.html.

PDF Details

ICRA Conference 2024 Conference Paper

MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats

Shenghai Yuan 0001
Yizhuo Yang 0001
Thien Hoang Nguyen
Thien-Minh Nguyen
Jianfei Yang
Fen Liu
Jianping Li 0004
Han Wang 0001

In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement new UAV threat detection tools. Our dataset closely simulates real-world scenarios by incorporating ambient heavy machinery sounds. This approach enhances the dataset’s applicability, capturing the exact challenges faced during proximate vehicular operations. It is expected that MMAUD can play a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Our dataset, codes, and designs will be available in https://ntu-aris.github.io/MMAUD.

Details

ICRA Conference 2024 Conference Paper

MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation

Haozhi Cao
Yuecong Xu
Jianfei Yang
Pengyu Yin
Shenghai Yuan 0001
Lihua Xie 0001

Multi-modal unsupervised domain adaptation (MM-UDA) for 3D semantic segmentation is a practical solution to embed semantic understanding in autonomous systems without expensive point-wise annotations. While previous MM-UDA methods can achieve overall improvement, they suffer from significant class-imbalanced performance, restricting their adoption in real applications. This imbalanced performance is mainly caused by: 1) self-training with imbalanced data and 2) the lack of pixel-wise 2D supervision signals. In this work, we propose Multi-modal Prior Aided (MoPA) domain adaptation to improve the performance of rare objects. Specifically, we develop Valid Ground-based Insertion (VGI) to rectify the imbalance supervision signals by inserting prior rare objects collected from the wild while avoiding introducing artificial artifacts that lead to trivial solutions. Meanwhile, our SAM consistency loss leverages the 2D prior semantic masks from SAM as pixel-wise supervision signals to encourage consistent predictions for each object in the semantic mask. The knowledge learned from modal-specific prior is then shared across modalities to achieve better rare object segmentation. Extensive experiments show that our method achieves state-of-the-art performance on the challenging MM-UDA benchmark. Code will be available at https://github.com/AronCao49/MoPA.

Details

IROS Conference 2023 Conference Paper

AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness

Yizhuo Yang 0001
Shenghai Yuan 0001
Muqing Cao
Jianfei Yang
Lihua Xie 0001

In this study, we introduce AV-PedAware, a self-supervised audio-visual fusion system designed to improve dynamic pedestrian awareness for robotics applications. Pedestrian awareness is a critical requirement in many robotics applications. However, traditional approaches that rely on cameras and LIDARs to cover multiple views can be expensive and susceptible to issues such as changes in illumination, occlusion, and weather conditions. Our proposed solution replicates human perception for 3D pedestrian detection using low-cost audio and visual fusion. This study represents the first attempt to employ audio-visual fusion to monitor footstep sounds for the purpose of predicting the movements of pedestrians in the vicinity. The system is trained through self-supervised learning based on LIDAR-generated labels, making it a cost-effective alternative to LIDAR-based pedestrian awareness. AV-PedAware achieves comparable results to LIDAR-based systems at a fraction of the cost. By utilizing an attention mechanism, it can handle dynamic lighting and occlusions, overcoming the limitations of traditional LIDAR and camera-based systems. To evaluate our approach's effectiveness, we collected a new multimodal pedestrian detection dataset and conducted experiments that demonstrate the system's ability to provide reliable 3D detection results using only audio and visual data, even in extreme visual conditions. We will make our collected dataset and source code available online for the community to encourage further development in the field of robotics perception systems.

Details

ICLR Conference 2023 Conference Paper

Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

Jianfei Yang
Xiangyu Peng
Kai Wang 0036
Zheng Zhu
Jiashi Feng
Lihua Xie 0001
Yang You 0001

Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain. It does not require access to both the source-domain data and the predictor parameters, thus addressing the data privacy and portability issues of standard domain adaptation methods. Existing DABP approaches mostly rely on knowledge distillation (KD) from the black-box predictor, i.e., training the model with its noisy target-domain predictions, which however inevitably introduces the confirmation bias accumulated from the prediction noises and leads to degrading performance. To mitigate such bias, we propose a new strategy, \textit{divide-to-adapt}, that purifies cross-domain knowledge distillation by proper domain division. This is inspired by an observation we make for the first time in domain adaptation: the target domain usually contains easy-to-adapt and hard-to-adapt samples that have different levels of domain discrepancy w.r.t. the source domain, and deep models tend to fit easy-to-adapt samples first. Leveraging easy-to-adapt samples with less noise can help KD alleviate the negative effect of prediction noises from black-box predictors. In this sense, the target domain can be divided into an easy-to-adapt subdomain with less noise and a hard-to-adapt subdomain at the early stage of training. Then the adaptation is achieved by semi-supervised learning. We further reduce distribution discrepancy between subdomains and develop weak-strong augmentation strategy to filter the predictor errors progressively. As such, our method is a simple yet effective solution to reduce error accumulation in cross-domain knowledge distillation for DABP. Moreover, we prove that the target error of DABP is bounded by the noise ratio of two subdomains, i.e., the confirmation bias, which provides the theoretical justifications for our method. Extensive experiments demonstrate our method achieves state of the art on all DABP benchmarks, outperforming the existing best approach by 7.0\% on VisDA-17, and is even comparable with the standard domain adaptation methods that use the source-domain data.

Details

NeurIPS Conference 2023 Conference Paper

Fast Model DeBias with Machine Unlearning

Ruizhe Chen
Jianfei Yang
Huimin Xiong
Jianhong Bai
Tianxiang Hu
Jin Hao
Yang Feng
Joey Tianyi Zhou

Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc. , as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing method (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets demonstrate that our method achieves superior or competing classification accuracies compared with state-of-the-art retraining-based methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.

PDF Details

NeurIPS Conference 2023 Conference Paper

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

Jianfei Yang
He Huang
Yunjiao Zhou
Xinyan Chen
Yuecong Xu
Shenghai Yuan
Han Zou
Chris Xiaoxuan Lu

4D human perception plays an essential role in a myriad of applications, such as home automation and metaverse avatar simulation. However, existing solutions which mainly rely on cameras and wearable devices are either privacy intrusive or inconvenient to use. To address these issues, wireless sensing has emerged as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals for device-free human sensing. In this paper, we propose MM-Fi, the first multi-modal non-intrusive 4D human dataset with 27 daily or rehabilitation action categories, to bridge the gap between wireless sensing and high-level human perception tasks. MM-Fi consists of over 320k synchronized frames of five modalities from 40 human subjects. Various annotations are provided to support potential sensing tasks, e. g. , human pose estimation and action recognition. Extensive experiments have been conducted to compare the sensing capacity of each or several modalities in terms of multiple tasks. We envision that MM-Fi can contribute to wireless sensing research with respect to action recognition, human pose estimation, multi-modal learning, cross-modal supervision, and interdisciplinary healthcare research.

PDF Details

AAAI Conference 2023 Conference Paper

SEnsor Alignment for Multivariate Time-Series Unsupervised Domain Adaptation

Yucheng Wang
Yuecong Xu
Jianfei Yang
Zhenghua Chen
Min Wu
Xiaoli Li
Lihua Xie

Unsupervised Domain Adaptation (UDA) methods can reduce label dependency by mitigating the feature discrepancy between labeled samples in a source domain and unlabeled samples in a similar yet shifted target domain. Though achieving good performance, these methods are inapplicable for Multivariate Time-Series (MTS) data. MTS data are collected from multiple sensors, each of which follows various distributions. However, most UDA methods solely focus on aligning global features but cannot consider the distinct distributions of each sensor. To cope with such concerns, a practical domain adaptation scenario is formulated as Multivariate Time-Series Unsupervised Domain Adaptation (MTS-UDA). In this paper, we propose SEnsor Alignment (SEA) for MTS-UDA to reduce the domain discrepancy at both the local and global sensor levels. At the local sensor level, we design the endo-feature alignment to align sensor features and their correlations across domains, whose information represents the features of each sensor and the interactions between sensors. Further, to reduce domain discrepancy at the global sensor level, we design the exo-feature alignment to enforce restrictions on the global sensor features. Meanwhile, MTS also incorporates the essential spatial-temporal dependencies information between sensors, which cannot be transferred by existing UDA methods. Therefore, we model the spatial-temporal information of MTS with a multi-branch self-attention mechanism for simple and effective transfer across domains. Empirical results demonstrate the state-of-the-art performance of our proposed SEA on two public MTS datasets for MTS-UDA. The code is available at https://github.com/Frank-Wang-oss/SEA

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Deep Reinforcement Learning Boosted Partial Domain Adaptation

Keyu Wu
Min Wu
Jianfei Yang
Zhenghua Chen
Zhengguo Li
Xiaoli Li

Domain adaptation is critical for learning transferable features that effectively reduce the distribution difference among domains. In the era of big data, the availability of large-scale labeled datasets motivates partial domain adaptation (PDA) which deals with adaptation from large source domains to small target domains with less number of classes. In the PDA setting, it is crucial to transfer relevant source samples and eliminate irrelevant ones to mitigate negative transfer. In this paper, we propose a deep reinforcement learning based source data selector for PDA, which is capable of eliminating less relevant source samples automatically to boost existing adaptation methods. It determines to either keep or discard the source instances based on their feature representations so that more effective knowledge transfer across domains can be achieved via filtering out irrelevant samples. As a general module, the proposed DRL-based data selector can be integrated into any existing domain adaptation or partial domain adaptation models. Extensive experiments on several benchmark datasets demonstrate the superiority of the proposed DRL-based data selector which leads to state-of-the-art performance for various PDA tasks.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Consensus Adversarial Domain Adaptation

Han Zou
Yuxun Zhou
Jianfei Yang
Huihan Liu
Hari Prasanna Das
Costas J. Spanos

We propose a novel domain adaptation framework, namely Consensus Adversarial Domain Adaptation (CADA), that gives freedom to both target encoder and source encoder to embed data from both domains into a common domaininvariant feature space until they achieve consensus during adversarial learning. In this manner, the domain discrepancy can be further minimized in the embedded space, yielding more generalizable representations. The framework is also extended to establish a new few-shot domain adaptation scheme (F-CADA), that remarkably enhances the ADA performance by efficiently propagating a few labeled data once available in the target domain. Extensive experiments are conducted on the task of digit recognition across multiple benchmark datasets and a real-world problem involving WiFi-enabled device-free gesture recognition under spatial dynamics. The results show the compelling performance of CADA versus the state-of-the-art unsupervised domain adaptation (UDA) and supervised domain adaptation (SDA) methods. Numerical experiments also demonstrate that F-CADA can significantly improve the adaptation performance even with sparsely labeled data in the target domain.

PDF Details

AAAI Conference 2018 Conference Paper

WiFi-Based Human Identification via Convex Tensor Shapelet Learning

Han Zou
Yuxun Zhou
Jianfei Yang
Weixi Gu
Lihua Xie
Costas Spanos

We propose AutoID, a human identiﬁcation system that leverages the measurements from existing WiFi-enabled Internet of Things (IoT) devices and produces the identity estimation via a novel sparse representation learning technique. The key idea is to use the unique ﬁne-grained gait patterns of each person revealed from the WiFi Channel State Information (CSI) measurements, technically referred to as shapelet signatures, as the ”ﬁngerprint” for human identiﬁcation. For this purpose, a novel OpenWrt-based IoT platform is designed to collect CSI data from commercial IoT devices. More importantly, we propose a new optimization-based shapelet learning framework for tensors, namely Convex Clustered Concurrent Shapelet Learning (C3 SL), which formulates the learning problem as a convex optimization. The global solution of C3 SL can be obtained efﬁciently with a generalized gradient-based algorithm, and the three concurrent regularization terms reveal the inter-dependence and the clustering effect of the CSI tensor data. Extensive experiments are conducted in multiple real-world indoor environments, showing that AutoID achieves an average human identiﬁcation accuracy of 91% from a group of 20 people. As a combination of novel sensing and learning platform, AutoID attains substantial progress towards a more accurate, cost-effective and sustainable human identiﬁcation system for pervasive implementations.

PDF Details