Author name cluster

Jiming Chen 0001

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

1 author row

ICRA Conference 2025 Conference Paper

AF-RLIO: Adaptive Fusion of Radar-LiDAR-Inertial Information for Robust Odometry in Challenging Environments

Chenglong Qian
Yang Xu 0042
Xiufang Shi
Jiming Chen 0001
Liang Li 0010

In robotic navigation, maintaining precise pose estimation and navigation in complex and dynamic environments is crucial. However, environmental challenges such as smoke, tunnels, and adverse weather can significantly degrade the performance of single-sensor systems like LiDAR or GPS, compromising the overall stability and safety of autonomous robots. To address these challenges, we propose AF-RLIO: an adaptive fusion approach that integrates 4D millimeterwave radar, LiDAR, inertial measurement unit (IMU), and GPS to leverage the complementary strengths of these sensors for robust odometry estimation in complex environments. Our method consists of three key modules. Firstly, the pre-processing module utilizes radar data to assist LiDAR in removing dynamic points and determining when environmental conditions are degraded for LiDAR. Secondly, the dynamic-aware multimodal odometry selects appropriate point cloud data for scan-tomap matching and tightly couples it with the IMU using the Iterative Error State Kalman Filter. Lastly, the factor graph optimization module balances weights between odometry and GPS data, constructing a pose graph for optimization. The proposed approach has been evaluated on datasets and tested in real-world robotic environments, demonstrating its effectiveness and advantages over existing methods in challenging conditions such as smoke and tunnels. Furthermore, we open source our code at https://github.com/NeSC-IV/AF-RLIO.git to benefit the research community.

ICRA Conference 2025 Conference Paper

Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Xian Wang
Jin Zhou
Yuanli Feng
Jiahao Mei
Jiming Chen 0001
Shuo Li

Recent innovations in autonomous drones have facilitated time-optimal flight in single-drone configurations, and enhanced maneuverability in multi-drone systems by applying optimal control and learning-based methods. However, few studies have achieved time-optimal motion planning for multi-drone systems, particularly during highly agile maneuvers or in dynamic scenarios. This paper presents a decentralized policy network using multi-agent reinforcement learning for time-optimal multi-drone flight. To strike a balance between flight efficiency and collision avoidance, we introduce a soft collision-free mechanism inspired by optimization-based methods. By customizing PPO in a centralized training, decentralized execution (CTDE) fashion, we unlock higher efficiency and stability in training while ensuring lightweight implementation. Extensive simulations show that, despite slight performance tradeoffs compared to single-drone systems, our multi-drone approach maintains near-time-optimal performance with a low collision rate. Real-world experiments validate our method, with two quadrotors using the same network as in simulation achieving a maximum speed of 13. 65 m/s and a maximum body rate of 13. 4 rad/s in a 5. 5 m × 5. 5 m × 2. 0 m space across various tracks, relying entirely on onboard computation [video 3 3 https://youtu.be/KACuFMtGGpo][code 4 4 https://github.com/KafuuChikai/Dashing-for-the-Golden-Snitch-Multi-Drone-RL].

IROS Conference 2025 Conference Paper

Online Motion Planning for Quadrotor Multi-Point Navigation Using Efficient Imitation Learning-Based Strategy

Jin Zhou
Jiahao Mei
Fangguo Zhao
Jiming Chen 0001
Shuo Li

Over the past decade, there has been a remarkable surge in utilizing quadrotors for various purposes due to their simple structure and aggressive maneuverability. One of the key challenges is online time-optimal trajectory generation and control technique. This paper proposes an imitation learning-based online solution to efficiently navigate the quadrotor through multiple waypoints with near-time-optimal performance. The neural networks (WN&CNets) are trained to learn the control law from the dataset generated by the time-consuming CPC algorithm and then deployed to generate the optimal control commands online to guide the quadrotors. To address the challenge of limited training data and the hover maneuver at the final waypoint, we propose a transition phase strategy that utilizes MINCO trajectories to help the quadrotor ‘jump over’ the stop-and-go maneuver when switching waypoints. Our method is demonstrated in both simulation and real-world experiments, achieving a maximum speed of 5. 6m/s while navigating through 7 waypoints in a confined space of 5. 5m × 5. 5m × 2. 0m [video 3 ]. The results show that with a slight loss in optimality, the WN&CNets significantly reduce the processing time and enable online control for multi-point flight tasks.

ICRA Conference 2025 Conference Paper

TaskExp: Enhancing Generalization of Multi-Robot Exploration with Multi-Task Pre-Training

Shaohao Zhu
Yixian Zhao
Yang Xu 0042
Anjun Chen
Jiming Chen 0001
Jinming Xu 0002

We aim to develop a general multi-agent reinforcement learning (MARL) policy that enables a group of robots to efficiently explore large-scale, unknown environments with random pose initialization. Existing MARL-based multi-robot exploration methods face challenges in reliably mapping observations to actions in large-scale scenarios and lack of zero-shot generalization to unknown environments. To this end, we propose a generic multi-task pre-training algorithm (termed TaskExp) to enhance the generalization of learning-based policies. In particular, we design a decision-related task to guide the policy to focus on valuable subspaces of the action space, improving the reliability of policy mapping. Moreover, two perception-related tasks-Location Estimation and Map Prediction-are designed to enhance the zero-shot capability of the policy by guiding it to extract general invariant features from unknown environments. With TaskExp pre-training, our policy significantly outperforms state-of-the-art planning-based methods in large-scale scenarios and demonstrates strong zero-shot performance in unseen environments. Furthermore, TaskExp can also be easily integrated to improve the existing learning-based multi-robot exploration methods.

IROS Conference 2025 Conference Paper

Temporal-Spatial Representation Fusion for Dexterous Manipulation Learning with Unpaired Visual-Action Data

Guwen Han
Zhengnan Sun
Qingtao Liu
Yu Cui
Anjun Chen
Huajin Chen
Rong Xiong
Jiming Chen 0001

Supervised behavioral cloning using robot visual-action data has been widely investigated in robot manipulation. However, these methods typically require simultaneous acquisition of visual and action data, which makes them difficult to utilize unpaired visual-action datasets: e. g. videos on Internet or action only data which has less privacy and security concerns. To take advantage of the action data without synchronized visual observation, we propose UnVALe, a novel dexterous robotic manipulation RL framework that utilizes action data without paired images to learn priors of human dexterous manipulation skills. Specifically, an LSTM-based network is designed to learn the temporal action prior by reconstructing the input trajectories, and a VAE network is designed to learn the spatial action prior by reconstructing the input action. Novel rewards are proposed to incorporate the priors into reinforcement learning, which encourages action output from RL polices to maintain low reconstruction errors in the LSTM and VAE networks. We perform extensive validation on three dexterous robot manipulation tasks. The experimental results show that UnVALe can effectively improve robot manipulation performance. Compared with existing visual pretraining methods, our method achieves a more than 30% increase in success rates.

ICRA Conference 2025 Conference Paper

UpViTaL: Unpaired Visual-Tactile Self-Supervised Representation Learning for Dexterous Robotic Manipulation

Guwen Han
Qingtao Liu
Yu Cui
Anjun Chen
Jiming Chen 0001
Qi Ye 0001

Visual and tactile pretraining have been extensively studied in dexterous robot manipulation tasks. However, existing methods typically require the simultaneous acquisition of visual and tactile data, making it difficult to utilize low-cost, unpaired visual-tactile datasets. Moreover, these methods often rely on tactile sensors to provide input data for reinforcement learning (RL) during the physical deployment of robotic dexterous hands, which highly increases deployment costs. To address these challenges, we propose UpViTaL, an unpaired visualtactile self-supervised representation learning method for RLbased robot dexterous manipulation. Specifically, we collect low-cost unpaired visual and tactile datasets for manipulation skill learning using a camera and tactile gloves on three robot manipulation tasks. The temporal tactile self-supervised representation learning module of UpViTaL is used to explore efficient tactile representations from time-series tactile data. In parallel, the visual pretraining module of UpViTaL helps to extract efficient visual representations from visual data. In addition, we fuse unpaired visual-tactile representations through an RL reward mechanism, which does not require robotic dexterous hands tactile sensors for practical deployment. We validate our approach on three dexterous robot manipulation tasks. Experimental results demonstrate that UpViTaL can efficiently learn robot manipulation skills. Compared to existing approaches for visual pretraining, our method significantly improves the success rate by more than 30%.

IROS Conference 2025 Conference Paper

VTAO-BiManip: Masked Visual-Tactile-Action Pre-training with Object Understanding for Bimanual Dexterous Manipulation

Zhengnan Sun
Zhaotai Shi
Jiayin Chen
Qingtao Liu
Yu Cui
Jiming Chen 0001
Qi Ye 0001

Bimanual dexterous manipulation remains a significant challenge in robotics due to the high DoFs of each hand and their coordination. Existing single-hand manipulation techniques often leverage human demonstrations to guide RL methods but fail to generalize to complex bimanual tasks involving multiple sub-skills. In this paper, we propose VTAO-BiManip, a novel framework that integrates visual-tactile-action pre-training with object understanding, aiming to enable human-like bimanual manipulation via curriculum reinforcement learning (RL). We improve prior learning by incorporating hand motion data, providing more effective guidance for dual-hand coordination. Our pretraining model predicts future actions as well as object pose and size using masked multimodal inputs, facilitating cross-modal regularization. To address the multi-skill learning challenge, we introduce a two-stage curriculum RL approach to stabilize training. We evaluate our method on a bimanual bottle-cap twisting task, demonstrating its effectiveness in both simulated and real-world environments. Our approach achieves a success rate that surpasses existing visual-tactile pretraining methods by over 20%.

ICLR Conference 2025 Conference Paper

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Qingtao Liu
Yu Cui
Zhengnan Sun
Gaofeng Li
Jiming Chen 0001
Qi Ye 0001

Vision and touch are the most commonly used senses in human manipulation. While leveraging human manipulation videos for robotic task pretraining has shown promise in prior works, it is limited to image and language modalities and deployment to simple parallel grippers. In this paper, aiming to address the limitations, we collect a vision-tactile dataset by humans manipulating 10 daily tasks and 182 objects. In contrast with the existing datasets, our dataset is the first visual-tactile dataset for complex robotic manipulation skill learning. Also, we introduce a novel benchmark, featuring six complex dexterous manipulation tasks and a reinforcement learning-based vision-tactile skill learning framework. 18 non-pretraining and pretraining methods within the framework are designed and compared to investigate the effectiveness of different modalities and pertaining strategies. Key findings based on our benchmark results and analyses experiments include: 1) Despite the tactile modality used in our experiments being binary and sparse, including it directly in the policy training boosts the success rate by about 20\% and joint pretraining it with vision gains a further 20\%. 2) Joint pretraining visual-tactile modalities exhibits strong adaptability in unknown tasks and achieves robust performance among all tasks. 3) Using binary tactile signals with vision is robust to viewpoint setting, tactile noise, and the binarization threshold, which facilitates to the visual-tactile policy to be deployed in reality. The dataset and benchmark are available at \url{https://github.com/LQTS/VTDexManip}.

ICLR Conference 2024 Conference Paper

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

Qihang Zhou
Guansong Pang
Yu Tian 0001
Shibo He
Jiming Chen 0001

Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, e.g., data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/ organs, can vary significantly. Recently large pre-trained vision-language models (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we introduce a novel approach, namely AnomalyCLIP, to adapt CLIP for accurate ZSAD across different domains. The key insight of AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. Large-scale experiments on 17 real-world anomaly detection datasets show that AnomalyCLIP achieves superior zero-shot performance of detecting and segmenting anomalies in datasets of highly diverse class semantics from various defect inspection and medical imaging domains. Code will be made available at https://github.com/zqhang/AnomalyCLIP.

ICRA Conference 2024 Conference Paper

Autonomous Implicit Indoor Scene Reconstruction with Frontier Exploration

Jing Zeng
Yanxu Li
Jiahao Sun
Qi Ye 0001
Yunlong Ran
Jiming Chen 0001

Implicit neural representations have demonstrated significant promise for 3D scene reconstruction. Recent works have extended their applications to autonomous implicit reconstruction through the Next Best View (NBV) based method. However, the NBV method cannot guarantee complete scene coverage and often necessitates extensive viewpoint sampling, particularly in complex scenes. In the paper, we propose to 1) incorporate frontier-based exploration tasks for global coverage with implicit surface uncertainty-based reconstruction tasks to achieve high-quality reconstruction. and 2) introduce a method to achieve implicit surface uncertainty using color uncertainty, which reduces the time needed for view selection. Further with these two tasks, we propose an adaptive strategy for switching modes in view path planning, to reduce time and maintain superior reconstruction quality. Our method exhibits the highest reconstruction quality among all planning methods and superior planning efficiency in methods involving reconstruction tasks. We deploy our method on a UAV and the results show that our method can plan multi-task views and reconstruct a scene with high quality.

ICRA Conference 2024 Conference Paper

CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction

Guwen Han
Qi Ye 0001
Anjun Chen
Jiming Chen 0001

Interactive hand mesh reconstruction from singleview images poses a significant challenge with the severe occlusion and depth ambiguity inherent in interactive hand gestures. Recent approaches that employ probabilistic models and tokenpruned techniques have shown decent results in multi-view human body reconstruction. Nevertheless, these methods have not fully utilized multi-scale semantic information from multiview images and are not applicable in scenarios involving severe occlusion during dual-hand interactions. Simultaneously, current single-view methods independently reconstruct the left and right hands, which are ineffective in enhancing the interaction between both hands. To address these challenges, we propose CAMInterHand, a cooperative attention-based method for multi-view interactive hand pose and mesh reconstruction. Specifically, CAMInterHand extracts local pyramid features and global vertex features from multi-scale feature maps of multi-view images, enabling the exploration of rich local semantic information and facilitating effective feature alignment. Furthermore, CAMInterHand employs the cooperative attention fusion module to fuse all features from multi-view images, enhancing interactions among vertices of dual hands within global and local contexts. We conduct extensive experiments on the large-scale multi-view dataset InterHand2. 6M and CAMInterHand achieves a substantial performance improvement over existing methods for multi-view and single-view interactive hand reconstruction.

UAI Conference 2024 Conference Paper

Differentially Private No-regret Exploration in Adversarial Markov Decision Processes

Shaojie Bai
Lanting Zeng
Chengcheng Zhao
Xiaoming Duan
Mohammad Sadegh Talebi
Peng Cheng 0001
Jiming Chen 0001

We study learning adversarial Markov decision process (MDP) in the episodic setting under the constraint of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in non-stationary and even adversarial scenarios, where protecting users’ sensitive information is vital. We first propose two efficient frameworks for adversarial MDPs, spanning full-information and bandit settings. Within each framework, we consider both Joint DP (JDP), where a central agent is trusted to protect the sensitive data, and Local DP (LDP), where the information is protected directly on the user side. Then, we design novel privacy mechanisms to privatize the stochastic transition and adversarial losses. By instantiating such privacy mechanisms to satisfy JDP and LDP requirements, we obtain near-optimal regret guarantees for both frameworks. To our knowledge, these are the first algorithms to tackle the challenge of private learning in adversarial MDPs.

ICRA Conference 2024 Conference Paper

InterRep: A Visual Interaction Representation for Robotic Grasping

Yu Cui
Qi Ye 0001
Qingtao Liu
Anjun Chen
Gaofeng Li
Jiming Chen 0001

Recently, pre-trained vision models have gained significant attention in motor control, showcasing impressive performance across diverse robotic learning tasks. While previous works predominantly concentrate on the significance of the pre-training phase, the equally important task of extracting more effective representations based on existing pre-trained visual models remains unexplored. To better leverage the representation capabilities of pre-trained models for robotic grasping, we propose InterRep, a novel interaction representation method that possesses not only the strengths of pre-trained models, known for their robustness in noisy environments and their proficiency in recognizing essential features, but also the capacity of capturing dynamic interaction details and local geometric features during the grasping process. Based on the novel representation, we introduce a deep reinforcement learning method to learn generalizable grasping policies. The experimental results demonstrate that our proposed representation outperforms the baselines in terms of both training speed and generalization. For the generalized grasping tasks with dexterous robotic hands, our method boasts a success rate nearly 20% higher than methods using the global features of the entire image from pre-trained models. In addition, our proposed representation method demonstrates promising performance when applied to a different robotic hand and task. It also exhibits excellent performance on real robots with a success rate of 70%.

ICRA Conference 2024 Conference Paper

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

Shaohao Zhu
Jiacheng Zhou
Anjun Chen
Mingming Bai
Jiming Chen 0001
Jinming Xu 0002

The sim-to-real gap poses a significant challenge in RL-based multi-agent exploration due to scene quantization and action discretization. Existing platforms suffer from the inefficiency in sampling and the lack of diversity in Multi-Agent Reinforcement Learning (MARL) algorithms across different scenarios, restraining their widespread applications. To fill these gaps, we propose MAexp, a generic platform for multi-agent exploration that integrates a broad range of state-of-the-art MARL algorithms and representative scenarios. Moreover, we employ point clouds to represent our exploration scenarios, leading to high-fidelity environment mapping and a sampling speed approximately 40 times faster than existing platforms. Furthermore, equipped with an attention-based Multi-Agent Target Generator and a Single-Agent Motion Planner, MAexp can work with arbitrary numbers of agents and accommodate various types of robots. Extensive experiments are conducted to establish the first benchmark featuring several high-performance MARL algorithms across typical scenarios for robots with continuous actions, which highlights the distinct strengths of each algorithm in different scenarios.

ICRA Conference 2024 Conference Paper

Masked Visual-Tactile Pre-training for Robot Manipulation

Qingtao Liu
Qi Ye 0001
Zhengnan Sun
Yu Cui
Gaofeng Li
Jiming Chen 0001

Recent works on the pretraining for robot manipulation have demonstrated that representations learning from large human manipulation data can generalize well to new manipulation tasks and environments. However, these approaches mainly focus on human vision or natural language, neglecting tactile feedback. In this article, we make an attempt to explore how to pre-train a representation model for robotic manipulation using both human manipulation visual and tactile data. We develop a system for collecting visual and tactile data, featuring a cost-effective tactile glove to capture human tactile data and Hololens2 for capturing visual data. With this system, we collect a dataset of turning bottle caps. Furthermore, we introduce a novel visual-tactile fusion network and learning strategy M 2 VTP, with one key module to tokenize 20 sparse binary tactile signals sensing touch states for the learning of tactile context and the other key module applying the attention and mask mechanism to the interaction of visual and tactile tokens for visual-tactile representation learning. We utilize our dataset to pre-train the fusion model and embed the pre-trained model into a reinforcement learning framework for downstream tasks. Experimental results demonstrate that our pre-trained model significantly aids in learning manipulation skills. Compared to methods without pre-training, our approach achieves a success rate increase of over 60%. Additionally, when compared to current visual pre-training methods, our success rate exceeds them by more than 50%.

ICRA Conference 2024 Conference Paper

TPGP: Temporal-Parametric Optimization with Deep Grasp Prior for Dexterous Motion Planning

Haoming Li 0004
Qi Ye 0001
Yuchi Huo
Qingtao Liu
Shijian Jiang
Tao Zhou
Xiang Li
Yang Zhou

Grasping motion planning aims to find a feasible grasping trajectory in the configuration space given an input target grasp. While optimizing grasp motion with two or three-fingered grippers has been well studied, the study on natural grasp motion planning with a dexterous hand remains a very challenging problem due to the high dimensional working space. In this work, we propose a novel temporal-parametric grasp prior (TPGP) optimization method to simplify the difficulty of grasping trajectory optimization for the dexterous hand while maintaining smooth and natural properties of the grasping motion. Specifically, we formulate the discrete trajectory parameters into a temporal-based parameterization, where the prior constraint provided by a hand poser network, is introduced to ensure that hand pose is natural and reasonable throughout the trajectory. Finally, we present a joint target optimization strategy to enhance the target pose for more feasible trajectories. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp motion on various metrics.

IROS Conference 2023 Conference Paper

Aggressive Trajectory Generation for a Swarm of Autonomous Racing Drones

Yuyang Shen
Jin Zhou
Danzhe Xu
Fangguo Zhao
Jinming Xu 0002
Jiming Chen 0001
Shuo Li

Autonomous drone racing is becoming an excellent platform to challenge quadrotors' autonomy techniques including planning, navigation and control technologies. However, most research on this topic mainly focuses on single drone scenarios. In this paper, we describe a novel time-optimal trajectory generation method for generating time-optimal trajectories for a swarm of quadrotors to fly through pre-defined waypoints with their maximum maneuverability without collision. We verify the method in the Gazebo simulations where a swarm of 5 quadrotors can fly through a complex 6-waypoint racing track in a $35m\times 35m$ space with a top speed of 14m/s. Flight tests are performed on two quadrotors passing through 3 waypoints in a $4m\times 2m$ flight arena to demonstrate the feasibility of the proposed method in the real world. Both simulations and real-world flight tests show that the proposed method can generate the optimal aggressive trajectories for a swarm of autonomous racing drones. The method can also be easily transferred to other types of robot swarms.

IROS Conference 2023 Conference Paper

DexRepNet: Learning Dexterous Robotic Grasping Network with Geometric and Spatial Hand-Object Representations

Qingtao Liu
Yu Cui
Qi Ye 0001
Zhengnan Sun
Haoming Li 0004
Gaofeng Li
Lin Shao 0002
Jiming Chen 0001

Robotic dexterous grasping is a challenging problem due to the high degree of freedom (DoF) and complex contacts of multi-fingered robotic hands. Existing deep re-inforcement learning (DRL) based methods leverage human demonstrations to reduce sample complexity due to the high dimensional action space with dexterous grasping. However, less attention has been paid to hand-object interaction representations for high-level generalization. In this paper, we propose a novel geometric and spatial hand-object interaction representation, named DexRep, to capture object surface features and the spatial relations between hands and objects during grasping. DexRep comprises Occupancy Feature for rough shapes within sensing range by moving hands, Surface Feature for changing hand-object surface distances, and LocalGeo Feature for local geometric surface features most related to potential contacts. Based on the new representation, we propose a dexterous deep reinforcement learning method DexRepNet to learn a generalizable grasping policy. Experimental results show that our method outperforms baselines using existing representations for robotic grasping dramatically both in grasp success rate and convergence speed. It achieves a 93% grasping success rate on seen objects and higher than 80% grasping success rates on diverse objects of unseen categories in both simulation and real-world experiments.

ICRA Conference 2023 Conference Paper

Efficient View Path Planning for Autonomous Implicit Reconstruction

Jing Zeng
Yanxu Li
Yunlong Ran
Shuo Li
Fei Gao
Lincheng Li
Shibo He
Jiming Chen 0001

Implicit neural representations have shown promising potential for 3D scene reconstruction. Recent work applies it to autonomous 3D reconstruction by learning information gain for view path planning. Effective as it is, the computation of the information gain is expensive, and compared with that using volumetric representations, collision checking using the implicit representation for a 3D point is much slower. In the paper, we propose to 1) leverage a neural network as an implicit function approximator for the information gain field and 2) combine the implicit fine-grained representation with coarse volumetric representations to improve efficiency. Further with the improved efficiency, we propose a novel informative path planning based on a graph-based planner. Our method demonstrates significant improvements in the reconstruction quality and planning efficiency compared with autonomous reconstructions with implicit and explicit representations. We deploy the method on a real UAV and the results show that our method can plan informative views and reconstruct a scene with high quality.

ICRA Conference 2023 Conference Paper

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

Anjun Chen
Xiangyu Wang
Kun Shi 0003
Shaohao Zhu
Bin Fang
Yingfeng Chen
Jiming Chen 0001
Yuchi Huo

3D human reconstruction from RGB images achieves decent results in good weather conditions but degrades dramatically in rough weather. Complementary, mmWave radars have been employed to reconstruct 3D human joints and meshes in rough weather. However, combining RGB and mmWave signals for robust all-weather 3D human reconstruction is still an open challenge, given the sparse nature of mmWave and the vulnerability of RGB images. In this paper, we present ImmFusion, the first mmWave-RGB fusion solution to reconstruct 3D human bodies in all weather conditions robustly. Specifically, our ImmFusion consists of image and point backbones for token feature extraction and a Transformer module for token fusion. The image and point backbones refine global and local features from original data, and the Fusion Transformer Module aims for effective information fusion of two modalities by dynamically selecting informative tokens. Extensive experiments on a large-scale dataset, mmBody, captured in various environments demonstrate that ImmFusion can efficiently utilize the information of two modalities to achieve a robust 3D human body reconstruction in all weather conditions. In addition, our method's accuracy is significantly superior to that of state-of-the-art Transformer-based LiDAR-camera fusion methods.

IROS Conference 2023 Conference Paper

InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild

Yanyan Shao
Qi Ye 0001
Wenhan Luo
Kaihao Zhang
Jiming Chen 0001

Understanding human interaction with objects is an important research topic for embodied Artificial Intelligence and identifying the objects that humans are interacting with is a primary problem for interaction understanding. Existing methods rely on frame-based detectors to locate interacting objects. However, this approach is subjected to heavy occlusions, background clutter, and distracting objects. To address the limitations, in this paper, we propose to leverage spatio-temporal information of hand-object interaction to track interactive objects under these challenging cases. Without prior knowledge of the general objects to be tracked like object tracking problems, we first utilize the spatial relation between hands and objects to adaptively discover the interacting objects from the scene. Second, the consistency and continuity of the appearance of objects between successive frames are exploited to track the objects. With this tracking formulation, our method also benefits from training on large-scale general object-tracking datasets. We further curate a video-level hand-object interaction dataset for testing and evaluation from 100DOH. The quantitative results demonstrate that our proposed method outperforms the state-of-the-art methods. Specifically, in scenes with continuous interaction with different objects, we achieve an impressive improvement of about 10% as evaluated using the Average Precision (AP) metric. Our qualitative findings also illustrate that our method can produce more continuous trajectories for interacting objects.

ECAI Conference 2020 Conference Paper

IAD: A Benchmark Dataset and a New Method for Illegal Advertising Classification

Zebo Liu
Kehan Li 0001
Xu Tan 0003
Jiming Chen 0001

While online advertising becomes ubiquitous and the pillar of the economy in Internet industry, there are increasing illegal ads which contain misleading or deceptive content and hinder the healthy development of online advertising. How to detect illegal advertising and classify it according to the provisions it violates, is critical for legal supervision. However, due to the difficulty of dataset acquisition and the lack of expert knowledge in advertising, benchmark datasets and methods for illegal advertising classification are scarce. In this paper, we collect and release a large-scale dataset for illegal advertising classification (called IAD, short for illegal ads), which contains the content of illegal ads and the corresponding violated provisions. IAD dataset has been released. Based on the IAD dataset, we further propose a novel method called IAD-Net to classify the violated provisions of the illegal ads. IAD-Net mainly adopts an interactive attention-based parallel LSTM network, where the parallel structure integrates the provision into classification process, equivalent to using prior information to supervise the classification. Besides, IAD-Net introduces an auxiliary embedding layer to enhance the semantics of lexicons in short ads, and an interactive attention mechanism to capture the relationship between lexicons in ads and its legality. We conduct comprehensive study on the IAD dataset and benchmark several previous methods as well as the proposed IAD-Net for illegal advertising classification. Experimental results demonstrate that IAD-Net achieves good accuracy and outperforms all the previous methods on IAD dataset. We believe the proposed IAD dataset and IAD-Net will help accelerate the research in the area of illegal advertising classification.