Author name cluster

Jian Pu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2026 Conference Paper

Decoupling Scene Perception and Ego Status: A Multi-Context Fusion Approach for Enhanced Generalization in End-to-End Autonomous Driving

Jiacheng Tang
Mingyue Feng
Jiachao Liu
Yaonong Wang
Jian Pu

Modular design of planning-oriented autonomous driving has markedly advanced end-to-end systems. However, existing architectures remain constrained by an over-reliance on ego status, hindering generalization and robust scene understanding. We identify the root cause as an inherent design within these architectures that allows ego status to be easily leveraged as a shortcut. Specifically, the premature fusion of ego status in the upstream BEV encoder allows an information flow from this strong prior to dominate the downstream planning module. To address this challenge, we propose AdaptiveAD, an architectural-level solution based on a multi-context fusion strategy. Its core is a dual-branch structure that explicitly decouples scene perception and ego status. One branch performs scene-driven reasoning based on multi-task learning, but with ego status deliberately omitted from the BEV encoder, while the other conducts ego-driven reasoning based solely on the planning task. A scene-aware fusion module then adaptively integrates the complementary decisions from the two branches to form the final planning trajectory. To ensure this decoupling does not compromise multi-task learning, we introduce a path attention mechanism for ego-BEV interaction and add two targeted auxiliary tasks: BEV unidirectional distillation and autoregressive online mapping. Extensive evaluations on the nuScenes dataset demonstrate that AdaptiveAD achieves state-of-the-art open-loop planning performance. Crucially, it significantly mitigates the over-reliance on ego status and exhibits impressive generalization capabilities across diverse scenarios.

PDF Details DOI

ICRA Conference 2025 Conference Paper

A2DO: Adaptive Anti-Degradation Odometry with Deep Multi-Sensor Fusion for Autonomous Navigation

Hui Lai
Qi Chen
Junping Zhang
Jian Pu

Accurate localization is essential for the safe and effective navigation of autonomous vehicles, and Simultaneous Localization and Mapping (SLAM) is a cornerstone technology in this context. However, The performance of the SLAM system can deteriorate under challenging conditions such as low light, adverse weather, or obstructions due to sensor degradation. We present A2DO, a novel end-to-end multi-sensor fusion odometry system that enhances robustness in these scenarios through deep neural networks. A2DO integrates LiDAR and visual data, employing a multilayer, multi-scale feature encoding module augmented by an attention mechanism to mitigate sensor degradation dynamically. The system is pretrained extensively on simulated datasets covering a broad range of degradation scenarios and fine-tuned on a curated set of real-world data, ensuring robust adaptation to complex scenarios. Our experiments demonstrate that A2DO maintains superior localization accuracy and robustness across various degradation conditions, showcasing its potential for practical implementation in autonomous vehicle systems.

Details

IROS Conference 2025 Conference Paper

CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation

Yifan Yang
Yuxiang Yan 0002
Boda Liu 0002
Jian Pu

Point clouds collected from real-world environments are often incomplete due to factors such as limited sensor resolution, single viewpoints, occlusions, and noise. These challenges make point cloud completion essential for various applications. A key difficulty in this task is predicting the overall shape and reconstructing missing regions from highly incomplete point clouds. To address this, we introduce CasPoinTr, a novel point cloud completion framework using cascaded networks and knowledge distillation. CasPoinTr decomposes the completion task into two synergistic stages: Shape Reconstruction, which generates auxiliary information, and Fused Completion, which leverages this information alongside knowledge distillation to generate the final output. Through knowledge distillation, a teacher model trained on denser point clouds transfers incomplete-complete associative knowledge to the student model, enhancing its ability to estimate the overall shape and predict missing regions. Together, the cascaded networks and knowledge distillation enhance the model’s ability to capture global shape context while refining local details, effectively bridging the gap between incomplete inputs and complete targets. Experiments on ShapeNet-55 under different difficulty settings demonstrate that CasPoinTr outperforms existing methods in shape recovery and detail preservation, highlighting the effectiveness of our cascaded structure and distillation strategy.

Details

ICLR Conference 2025 Conference Paper

Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

Xin Gao
Jian Pu

Multi-View Representation Learning (MVRL) aims to derive a unified representation from multi-view data by leveraging shared and complementary information across views. However, when views are irregularly missing, the incomplete data can lead to representations that lack sufficiency and consistency. To address this, we propose Multi-View Permutation of Variational Auto-Encoders (MVP), which excavates invariant relationships between views in incomplete data. MVP establishes inter-view correspondences in the latent space of Variational Auto-Encoders, enabling the inference of missing views and the aggregation of more sufficient information. To derive a valid Evidence Lower Bound (ELBO) for learning, we apply permutations to randomly reorder variables for cross-view generation and then partition them by views to maintain invariant meanings under permutations. Additionally, we enhance consistency by introducing an informational prior with cyclic permutations of posteriors, which turns the regularization term into a similarity measure across distributions. We demonstrate the effectiveness of our approach on seven diverse datasets with varying missing ratios, achieving superior performance in multi-view clustering and generation tasks.

Details

NeurIPS Conference 2025 Conference Paper

GMV: A Unified and Efficient Graph Multi-View Learning Framework

Qipeng zhu
Jie Chen
Jian Pu
Junping Zhang

Graph Neural Networks (GNNs) are pivotal in graph classification but often struggle with generalization and overfitting. We introduce a unified and efficient Graph Multi-View (GMV) learning framework that integrates multi-view learning into GNNs to enhance robustness and efficiency. Leveraging the lottery ticket hypothesis, GMV activates diverse sub-networks within a single GNN through a novel training pipeline, which includes mixed-view generation, and multi-view decomposition and learning. This approach simultaneously broadens "views" from the data, model, and optimization perspectives during training to enhance the generalization capabilities of GNNs. During inference, GMV only incorporates additional prediction heads into standard GNNs, thereby achieving multi-view learning at minimal cost. Our experiments demonstrate that GMV surpasses other augmentation and ensemble techniques for GNNs and Graph Transformers across various graph classification scenarios.

PDF Details

NeurIPS Conference 2025 Conference Paper

Learning Spatial-Aware Manipulation Ordering

Yuxiang Yan
Zhiyuan Zhou
Xin Gao
Guanghao Li
Shenglin Li
Jiaqi Chen
Qunyan Pu
Jian Pu

Manipulation in cluttered environments is challenging due to spatial dependencies among objects, where an improper manipulation order can cause collisions or blocked access. Existing approaches often overlook these spatial relationships, limiting their flexibility and scalability. To address these limitations, we propose OrderMind, a unified spatial-aware manipulation ordering framework that directly learns object manipulation priorities based on spatial context. Our architecture integrates a spatial context encoder with a temporal priority structuring module. We construct a spatial graph using k-Nearest Neighbors to aggregate geometric information from the local layout and encode both object-object and object-manipulator interactions to support accurate manipulation ordering in real-time. To generate physically and semantically plausible supervision signals, we introduce a spatial prior labeling method that guides a vision-language model to produce reasonable manipulation orders for distillation. We evaluate OrderMind on our Manipulation Ordering Benchmark, comprising 163, 222 samples of varying difficulty. Extensive experiments in both simulation and real-world environments demonstrate that our method significantly outperforms prior approaches in effectiveness and efficiency, enabling robust manipulation in cluttered scenes.

PDF Details

AAAI Conference 2025 Conference Paper

PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation

Shoumeng Qiu
Xinrun Li
Xiangyang Xue
Jian Pu

Although multiview fusion has demonstrated potential in LiDAR segmentation, its dependence on computationally intensive point-based interactions, arising from the lack of fixed correspondences between views such as range view and Bird's-Eye View (BEV), hinders its practical deployment. This paper challenges the prevailing notion that multiview fusion is essential for achieving high performance. We demonstrate that significant gains can be realized by directly fusing Polar and Cartesian partitioning strategies within the BEV space. Our proposed BEV-only segmentation model leverages the inherent fixed grid correspondences between these partitioning schemes, enabling a fusion process that is orders of magnitude faster (170x speedup) than conventional point-based methods. Furthermore, our approach facilitates dense feature fusion, preserving richer contextual information compared to sparse point-based alternatives. To enhance scene understanding while maintaining inference efficiency, we also introduce a hybrid Transformer-CNN architecture. Extensive evaluation on the SemanticKITTI and nuScenes datasets provides compelling evidence that our method outperforms previous multiview fusion approaches in terms of both performance and inference speed, highlighting the potential of BEV-based fusion for LiDAR segmentation.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

Rong Ma
Jie Chen
Xiangyang Xue
Jian Pu

Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks. This enables semantic segmentation models to be trained simultaneously on multiple datasets, resulting in performance improvements. Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation. This significantly enhances the efficiency and effectiveness of multi-dataset segmentation model training. The results demonstrate that our method significantly outperforms other multi-dataset training methods when trained on seven datasets simultaneously, and achieves state-of-the-art performance on the WildDash 2 benchmark. Our code can be found in https: //github. com/Mrhonor/AutoUniSeg.

PDF Details DOI

ICRA Conference 2024 Conference Paper

FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

Jiawei Hou
Xiaoyan Li
Wenhao Guan
Gang Zhang
Di Feng
Yuheng Du
Xiangyang Xue 0001
Jian Pu

In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird’s-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating the model while keeping its accuracy. Targeted at improving this component, the time-consuming 3D convolution network is replaced with a novel residual-like architecture, where features are mainly digested by a lightweight 2D BEV convolution network and compensated by integrating the 3D voxel features interpolated from the original image features. Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves state-of-the-art results with a fast inference speed.

Details

AAAI Conference 2024 Conference Paper

FD3D: Exploiting Foreground Depth Map for Feature-Supervised Monocular 3D Object Detection

Zizhang Wu
Yuanzhu Gan
Yunzhe Wu
Ruihao Wang
Xiaoquan Wang
Jian Pu

Monocular 3D object detection usually adopts direct or hierarchical label supervision. Recently, the distillation supervision transfers the spatial knowledge from LiDAR- or stereo-based teacher networks to monocular detectors, but remaining the domain gap. To mitigate this issue and pursue adequate label manipulation, we exploit Foreground Depth map for feature-supervised monocular 3D object detection named FD3D, which develops the high-quality instructive intermediate features to conduct desirable auxiliary feature supervision with only the original image and annotation foreground object-wise depth map (AFOD) as input. Furthermore, we build up our instructive feature generation network to create instructive spatial features based on the sufficient correlation between image features and pre-processed AFOD, where AFOD provides the attention focus only on foreground objects to achieve clearer guidance in the detection task. Moreover, we apply the auxiliary feature supervision from the pixel and distribution level to achieve comprehensive spatial knowledge guidance. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both the KITTI and nuScenes datasets, with no external data and no extra inference computational cost. We also conduct quantitative and qualitative studies to reveal the effectiveness of our designs.

PDF Details DOI

IROS Conference 2024 Conference Paper

HP 3: Hierarchical Prediction-Pretrained Planning for Unprotected Left Turn

Zhihao Ou
Zhibo Wang
Yue Hua
Jinsheng Dou
Di Feng
Jian Pu

Trajectory planning for unprotected left turns poses a significant challenge in autonomous driving. Reinforcement learning (RL) offers potential, but existing methods often rely on scenario-specific state representations, limiting their adaptability. This paper introduces Hierarchical Prediction-Pretrained Planning (HP 3 ), a generalizable hierarchical RL framework designed for unprotected left turns. HP 3 leverages historical trajectories of all vehicles and complete map information to achieve versatile state representation and generalizable scene understanding. Its two-layer architecture predicts semantic behavior (upper layer) and generates corresponding trajectories (lower layer). A scene encoder comprehends trajectories and roads, while a trajectory decoder outputs sequential points. To accelerate convergence, we pretrain the main network on a modified trajectory prediction dataset. Evaluation on a CARLA-based map with complex, unprotected left-turn intersections demonstrates HP 3 ’s superiority over rule-based and simple RL-based methods, highlighting the effectiveness of our pretraining approach for this critical autonomous driving task.

Details

ICRA Conference 2024 Conference Paper

Mitigating Causal Confusion in Vector-Based Behavior Cloning for Safer Autonomous Planning

Jiayu Guo 0001
Mingyue Feng
Pengfei Zhu
Jinsheng Dou
Di Feng
Chengjun Li
Ru Wan
Jian Pu

The utilization of vector-based deep learning techniques has great prospects in the realm of autonomous driving, particularly in the domains of prediction and planning tasks. However, the application of vector-based backbones for prediction and planning tasks may lead to the occurrence of causal confusion. Previous studies have explored the phenomenon of causal confusion, with a specific emphasis on the context of visual imitation learning. As for the vector-based model, we observe that the states of surrounding vehicles can be a nuisance shortcut. In our work, an off-policy approach is proposed to alleviate the issue by incorporating de-confounding supervision. Additionally, to better capture the environmental cues, such as route and traffic lights, in vectorized representation, a decoder utilizing iterative route fusion is devised. By incorporating auxiliary supervision and employing a dedicated decoder, we demonstrate the effectiveness of our methods in reducing causal confusion and improving performance in planning tasks through reactive and nonreactive closed-loop simulations on the nuPlan dataset.

Details

ICRA Conference 2024 Conference Paper

Multi-LIO: A Lightweight Multiple LiDAR-Inertial Odometry System

Qi Chen 0025
Guanghao Li 0001
Xiangyang Xue 0001
Jian Pu

The integration of multiple LiDAR sensors has the potential to significantly enhance odometry systems by providing comprehensive environmental measurements. However, current multiple LiDAR-inertial odometry frameworks face challenges in real-time processing due to the voluminous data generated. This paper introduces a real-time, computationally efficient multiple LiDAR-inertial odometry system (Multi-LIO) that outperforms existing state-of-the-art solutions in accuracy and scalability. Utilizing a novel parallel strategy for state updates and a voxelized map format, Multi-LIO optimizes computational efficiency. Furthermore, we introduce a point-wise uncertainty estimation method to augment the accuracy of scan-to-map registration, particularly in large-scale and complex scenarios. We validate our system’s performance through extensive experiments on various challenging sequences. Multi-LIO emerges as a robust, scalable, and extensible solution, adaptable to various LiDAR configurations.

Details

ICRA Conference 2024 Conference Paper

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

Yuxiang Yan 0002
Boda Liu 0002
Jianfei Ai
Qinbu Li
Ru Wan
Jian Pu

Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC, the first cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. These scenes exhibit long-range perception and minimal occlusion. We develop an automated annotation pipeline leveraging Semantic Segment Anything to efficiently assign semantics. To benchmark progress, we propose a LiDAR-based model with a Spatial-Aware Transformer for global and local feature extraction and a Completion and Segmentation Cooperative Module for joint completion and segmentation. PointSSC provides a challenging testbed to drive advances in semantic point cloud completion for real-world navigation. The code and datasets are available at https://github.com/yyxssm/PointSSC.

Details

TMLR Journal 2024 Journal Article

SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP

Jie Chen
Mingyuan Bai
Shouzhen Chen
Junbin Gao
Junping Zhang
Jian Pu

The recursive node fetching and aggregation in message-passing cause inference latency when deploying Graph Neural Networks (GNNs) to large-scale graphs. One promising inference acceleration direction is to distill GNNs into message-passing-free student Multi-Layer Perceptrons (MLPs). However, the MLP student without graph dependency cannot fully learn the structure knowledge from GNNs, which causes inferior performance in heterophilic and online scenarios. To address this problem, we first design a simple yet effective Structure-Aware MLP (SA-MLP) as a student model. It utilizes linear layers as encoders and decoders to capture features and graph structures without message-passing among nodes. Furthermore, we introduce a novel structure-mixing knowledge distillation technique. It generates virtual samples imbued with a hybrid of structure knowledge from teacher GNNs, thereby enhancing the learning ability of MLPs for structure information. Extensive experiments on eight benchmark datasets under both transductive and online settings show that our SA-MLP can consistently achieve similar or even better results than teacher GNNs while maintaining as fast inference speed as MLPs. Our findings reveal that SA-MLP efficiently assimilates graph knowledge through distillation from GNNs in an end-to-end manner, eliminating the need for complex model architectures and preprocessing of features/structures. Our code is available at https://github.com/JC-202/SA-MLP.

PDF Details

AAAI Conference 2023 Conference Paper

Attention-Based Depth Distillation with 3D-Aware Positional Encoding for Monocular 3D Object Detection

Zizhang Wu
Yunzhe Wu
Jian Pu
Xianzhi Li
Xiaoquan Wang

Monocular 3D object detection is a low-cost but challenging task, as it requires generating accurate 3D localization solely from a single image input. Recent developed depth-assisted methods show promising results by using explicit depth maps as intermediate features, which are either precomputed by monocular depth estimation networks or jointly evaluated with 3D object detection. However, inevitable errors from estimated depth priors may lead to misaligned semantic information and 3D localization, hence resulting in feature smearing and suboptimal predictions. To mitigate this issue, we propose ADD, an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding. Unlike previous knowledge distillation frameworks that adopt stereo- or LiDAR-based teachers, we build up our teacher with identical architecture as the student but with extra ground-truth depth as input. Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth. Specifically, we leverage intermediate features and responses for knowledge distillation. Considering long-range 3D dependencies, we propose 3D-aware self-attention and target-aware cross-attention modules for student adaptation. Extensive experiments are performed to verify the effectiveness of our framework on the challenging KITTI 3D object detection benchmark. We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost relative to baseline models. Our code is available at https://github.com/rockywind/ADD.

PDF Details DOI

ECAI Conference 2023 Conference Paper

Instance-Aware Diffusion Implicit Process for Box-Based Instance Segmentation

Hao Ren 0002
Xingsong Liu
Junjian Huang
Ru Wan
Jian Pu
Hong Lu 0001

The diffusion model has demonstrated impressive performance in image generation, but its potential for discriminative tasks such as instance segmentation remains unexplored. In this paper, we propose an Instance-aware Diffusion Implicit Process (IDIP) framework for instance segmentation based on boxes. During training, IDIP diffuses ground-truth boxes across various time steps, extracting corresponding Region of Interest (RoI) features. Dynamic convolution is then used to predict boxes and categories for each RoI, and the mask head generates masks from these predictions. During inference, IDIP iteratively refines randomly generated boxes with the denoising diffusion implicit model, while the mask head derives final masks from RoIs based on the refined boxes. Our method surpasses existing approaches on the COCO benchmark, requiring fewer training steps and less memory resources due to its dynamic design and instance-aware characteristic.

Details

IJCAI Conference 2023 Conference Paper

Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention

Zizhang Wu
Zhuozheng Li
Zhi-Gang Fan
Yunzhe Wu
Yuanzhu Gan
Jian Pu

The monocular depth estimation task has recently revealed encouraging prospects, especially for the autonomous driving task. To tackle the ill-posed problem of 3D geometric reasoning from 2D monocular images, multi-frame monocular methods are developed to leverage the perspective correlation information from sequential temporal frames. However, moving objects such as cars and trains usually violate the static scene assumption, leading to feature inconsistency deviation and misaligned cost values, which would mislead the optimization algorithm. In this work, we present CTA-Depth, a Context-aware Temporal Attention guided network for multi-frame monocular Depth estimation. Specifically, we first apply a multi-level attention enhancement module to integrate multi-level image features to obtain an initial depth and pose estimation. Then the proposed CTA-Refiner is adopted to alternatively optimize the depth and pose. During the CTA-Refiner process, context-aware temporal attention (CTA) is developed to capture the global temporal-context correlations to maintain the feature consistency and estimation integrity of moving objects. In particular, we propose a long-range geometry embedding (LGE) module to produce a long-range temporal geometry prior. Our approach achieves significant improvements (e. g. , 13. 5% for the Abs Rel metric on the KITTI dataset) over state-of-the-art approaches on three benchmark datasets.

PDF Details DOI

ICRA Conference 2023 Conference Paper

MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

Zizhang Wu
Yuanzhu Gan
Lei Wang
Guilian Chen
Jian Pu

Monocular 3D object detection reveals an economical but challenging task in autonomous driving. Recently center-based monocular methods have developed rapidly with a great trade-off between speed and accuracy, where they usually depend on the object center's depth estimation via 2D features. However, the visual semantic features without sufficient pixel geometry information, may affect the performance of clues for spatial 3D detection tasks. To alleviate this, we propose MonoPGC, a novel end-to-end Monocular 3D object detection framework with rich Pixel Geometry Contexts. We introduce the pixel depth estimation as our auxiliary task and design depth cross-attention pyramid module (DCPM) to inject local and global depth geometry knowledge into visual features. In addition, we present the depth-space-aware transformer (DSAT) to integrate 3D space position and depth-aware features efficiently. Besides, we design a novel depth-gradient positional encoding (DGPE) to bring more distinct pixel geometry contexts into the transformer for better object detection. Extensive experiments demonstrate that our method achieves the state-of-the-art performance on the KITTI dataset.

Details

ICRA Conference 2023 Conference Paper

Multi-to-Single Knowledge Distillation for Point Cloud Semantic Segmentation

Shoumeng Qiu
Feng Jiang
Haiqiang Zhang
Xiangyang Xue 0001
Jian Pu

3D point cloud semantic segmentation is one of the fundamental tasks for environmental understanding. Although significant progress has been made in recent years, the performance of classes with few examples or few points is still far from satisfactory. In this paper, we propose a novel multi-to-single knowledge distillation framework for the 3D point cloud semantic segmentation task to boost the performance of those hard classes. Instead of fusing all the points of multi-scans directly, only the instances that belong to the previously defined hard classes are fused. To effectively and sufficiently distill valuable knowledge from multi-scans, we leverage a multilevel distillation framework, i. e. , feature representation distillation, logit distillation, and affinity distillation. We further develop a novel instance-aware affinity distillation algorithm for capturing high-level structural knowledge to enhance the distillation efficacy for hard classes. Finally, we conduct experiments on the SemanticKITTI dataset, and the results on both the validation and test sets demonstrate that our method yields substantial improvements compared with the baseline method. The code is available at https://github.com/skyshoumeng/M2SKD.

Details

ICRA Conference 2023 Conference Paper

MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Zizhang Wu
Guilian Chen
Yuanzhu Gan
Lei Wang
Jian Pu

Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straightforward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross-modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic-aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to fuse our radar and image features to strengthen the two modals' correlation from the global scope via the cross-attention mechanism. Extensive experiments show that MVFusion achieves state-of-the-art performance (51. 7% NDS and 45. 3% mAP) on the nuScenes dataset. We shall release our code and trained networks upon publication.

Details

AAAI Conference 2019 Conference Paper

The Kelly Growth Optimal Portfolio with Ensemble Learning

Weiwei Shen
Bin Wang
Jian Pu
Jun Wang

As a competitive alternative to the Markowitz mean-variance portfolio, the Kelly growth optimal portfolio has drawn sufficient attention in investment science. While the growth optimal portfolio is theoretically guaranteed to dominate any other portfolio with probability 1 in the long run, it practically tends to be highly risky in the short term. Moreover, empirical analysis and performance enhancement studies under practical settings are surprisingly short. In particular, how to handle the challenging but realistic condition with insufficient training data has barely been investigated. In order to fill voids, especially grappling with the difficulty from small samples, in this paper, we propose a growth optimal portfolio strategy equipped with ensemble learning. We synergically leverage the bootstrap aggregating algorithm and the random subspace method into portfolio construction to mitigate estimation error. We analyze the behavior and hyperparameter selection of the proposed strategy by simulation, and then corroborate its effectiveness by comparing its out-of-sample performance with those of 10 competing strategies on four datasets. Experimental results lucidly confirm that the new strategy has superiority in extensive evaluation criteria.

PDF Details

AAAI Conference 2018 Conference Paper

Tau-FPL: Tolerance-Constrained Learning in Linear Time

Ao Zhang
Nan Li
Jian Pu
Jun Wang
Junchi Yan
Hongyuan Zha

In many real-world applications, learning a classiﬁer with false-positive rate under a speciﬁed tolerance is appealing. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classi- ﬁers, which are of limitation in methodology since they do not directly incorporate the false-positive rate tolerance. In this paper, we propose a novel scoring-thresholding approach, τ- False Positive Learning (τ-FPL) to address this problem. We show that the scoring problem which takes the false-positive rate tolerance into accounts can be efﬁciently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classiﬁer. Both theoretical analysis and experimental results show superior performance of the proposed τ-FPL over the existing approaches.

PDF Details

IJCAI Conference 2013 Conference Paper

Multiple Task Learning Using Iteratively Reweighted Least Square

Jian Pu
Yu-Gang Jiang
Jun Wang
Xiangyang Xue

Multiple task learning (MTL) is becoming popular due to its theoretical advances and empirical successes. The key idea of MTL is to explore the hidden relationships among multiple tasks to enhance learning performance. Recently, many MTL algorithms have been developed and applied to various problems such as feature selection and kernel learning. However, most existing methods highly relied on certain assumptions of the task relationships. For instance, several works assumed that there is a major task group and several outlier tasks, and used a decomposition approach to identify the group structure and outlier tasks simultaneously. In this paper, we adopt a more general formulation for MTL without making speciﬁc structure assumptions. Instead of performing model decomposition, we directly impose an elastic-net regularization with a mixture of the structure and outlier penalties and formulate the objective as an unconstrained convex problem. To derive the optimal solution efﬁciently, we propose to use an Iteratively Reweighted Least Square (IRLS) method with a preconditioned conjugate gradient, which is computationally affordable for high dimensional data. Extensive experiments are conducted over both synthetic and real data, and comparisons with several state-of-the-art algorithms clearly show the superior performance of the proposed method.

PDF Details DOI