Author name cluster

Ao Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

I2CD: An Invertible Causal Framework for Compositional Zero-Shot Learning via Disentangle-Compose-Disentangle

Zhaoquan Yuan
Zining Wang
Yuankang Pan
Ao Luo
Wei Li
Xiao Wu
Changsheng Xu

Compositional Zero-Shot Learning (CZSL) addresses the challenge of recognizing unseen attribute-object compositions in images, representing a fundamental challenge in artificial intelligence. Current approaches, which primarily focus on semantic alignment or distribution independence of primitives, have not achieved effective state-object decoupling and causal interventional invariance, limiting their performance on unseen compositions. To tackle this challenge, this study introduces I2CD (Invertible Causal framework via Disentangle-Compose-Disentangle), a novel framework that integrates invertible neural networks with causal intervention techniques to achieve state-object disentanglement. The framework employs a disentangle-compose-disentangle mechanism for counterfactual generation within the disentangled representation space, ensuring that modifications to one primitive (attribute or object) maintain independence from the other, thus enabling robust causal disentanglement. Representational consistency is maintained through semantic alignment between initial disentangled representations and their recomposed-then-disentangled counterparts with corresponding textual concepts. Comprehensive evaluations on three benchmark datasets—MIT-States, UT-Zappos, and C-GQA—demonstrate the framework's effectiveness in achieving both disentanglement and compositional generalization in CZSL tasks.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler

Kunyu Peng
Di Wen
Kailun Yang
Ao Luo
Yufan Chen
Jia Fu
M. Saquib Sarfraz
Alina Roitberg

In Open-Set Domain Generalization (OSDG), the model is exposed to both new variations of data appearance (domains) and open-set conditions, where both known and novel categories are present at test time. The challenges of this task arise from the dual need to generalize across diverse domains and accurately quantify category novelty, which is critical for applications in dynamic environments. Recently, meta-learning techniques have demonstrated superior results in OSDG, effectively orchestrating the meta-train and -test tasks by employing varied random categories and predefined domain partition strategies. These approaches prioritize a well-designed training schedule over traditional methods that focus primarily on data augmentation and the enhancement of discriminative feature learning. The prevailing meta-learning models in OSDG typically utilize a predefined sequential domain scheduler to structure data partitions. However, a crucial aspect that remains inadequately explored is the influence brought by strategies of domain schedulers during training. In this paper, we observe that an adaptive domain scheduler benefits more in OSDG compared with prefixed sequential and random domain schedulers. We propose the Evidential Bi-Level Hardest Domain Scheduler (EBiL-HaDS) to achieve an adaptive domain scheduler. This method strategically sequences domains by assessing their reliabilities in utilizing a follower network, trained with confidence scores learned in an evidential manner, regularized by max rebiasing discrepancy, and optimized in a bilevel manner. We verify our approach on three OSDG benchmarks, i. e. , PACS, DigitsDG, and OfficeHome. The results show that our method substantially improves OSDG performance and achieves more discriminative embeddings for both the seen and unseen categories, underscoring the advantage of a judicious domain scheduler for the generalizability to unseen domains and unseen categories. The source code is publicly available at https: //github. com/KPeng9510/EBiL-HaDS.

PDF Details DOI

AAAI Conference 2024 Conference Paper

SCP: Spherical-Coordinate-Based Learned Point Cloud Compression

Ao Luo
Linxin Song
Keisuke Nonaka
Kyohei Unno
Heming Sun
Masayuki Goto
Jiro Katto

In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to fully leverage the features of circular shapes and azimuthal angle invariance. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate.

PDF Details DOI

IROS Conference 2022 Conference Paper

Attention-Based Deep Driving Model for Autonomous Vehicles with Surround-View Cameras

Yang Zhao 0024
Jie Li
Rui Huang 0008
Boqi Li 0001
Ao Luo
Yaochen Li
Hong Cheng 0002

Experienced human drivers always make safe driving decisions by selectively observing the front, rear and side- view mirrors. Several end - to-end methods have been pro-posed to learn driving models with multi-view visual infor-mation. However, these benchmark methods lack semantic understanding of multi-view image contents, where human drivers usually reason these information for decision making with different visual region of interests. In this paper, we propose an attention-based deep learning method to learn a driving model with input of surround-view visual information and the route planner, in which a multi-view attention module is designed for obtaining region of interests from human drivers. We evaluate our model on the Drive360 dataset with comparison of benchmarking deep driving models. Results demonstrate that our model achieves a competitive accuracy in both steering angle and speed prediction than benchmarking methods. Code is available at https://githuh.com/jet-uestc/MVA-Net.

Details

AAAI Conference 2022 Conference Paper

Learning Optical Flow with Adaptive Graph Reasoning

Ao Luo
Fan Yang
Kunming Luo
Xin Li
Haoqiang Fan
Shuaicheng Liu

Estimating per-pixel motion between video frames, known as optical flow, is a long-standing problem in video understanding and analysis. Most contemporary optical flow techniques largely focus on addressing the cross-image matching with feature similarity, with few methods considering how to explicitly reason over the given scene for achieving a holistic motion understanding. In this work, taking a fresh perspective, we introduce a novel graph-based approach, called adaptive graph reasoning for optical flow (AGFlow), to emphasize the value of scene/context information in optical flow. Our key idea is to decouple the context reasoning from the matching procedure, and exploit scene information to effectively assist motion estimation by learning to reason over the adaptive graph. The proposed AGFlow can effectively exploit the context information and incorporate it within the matching procedure, producing more robust and accurate results. On both Sintel clean and final passes, our AGFlow achieves the best accuracy with EPE of 1. 43 and 2. 47 pixels, outperforming state-of-the-art approaches by 11. 2% and 13. 6%, respectively. Code is publicly available at https: //github. com/ megvii-research/AGFlow.

PDF Details

IROS Conference 2021 Conference Paper

TemporalFusion: Temporal Motion Reasoning with Multi-Frame Fusion for 6D Object Pose Estimation

Fengjun Mu
Rui Huang 0008
Ao Luo
Xin Li 0079
Jing Qiu 0004
Hong Cheng 0002

6D object pose estimation is an essential task in vision-based robotic grasping and manipulation. Prior works extract spatial features by fusing the RGB image and depth without considering the temporal motion information, limiting their performance in heavy occlusion robotic grasping scenarios. In this paper, we present an end-to-end model named TemporalFusion, which integrates the temporal motion information from RGB-D images for 6D object pose estimation. The core of proposed TemporalFusion model is to embed and fuse the temporal motion information from multi-frame RGB-D sequences, which could handle heavy occlusion in robotic grasping tasks. Furthermore, the proposed deep model can also obtain stable pose sequences, which is essential for real-time robotic grasping tasks. We evaluated the proposed method in the YCB-Video dataset, and experimental results show our model outperforms state-of-the-art approaches. Our code is available at https://github.com/mufengjun260/TemporalFusion21.

Details

AAAI Conference 2020 Conference Paper

Hybrid Graph Neural Networks for Crowd Counting

Ao Luo
Fan Yang
Xin Li
Dong Nie
Zhicheng Jiao
Shangchen Zhou
Hong Cheng

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i. e. , localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a uni- ﬁed network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Speciﬁcally, HyGnn integrates a hybrid graph to jointly represent the task-speciﬁc feature maps of diﬀerent scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneﬁcial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs signiﬁcantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF CC 50 and UCF QNRF, outperforming the state-ofthe-art algorithms by a large margin.

PDF Details

IROS Conference 2019 Conference Paper

End-to-End Driving Model for Steering Control of Autonomous Vehicles with Future Spatiotemporal Features

Tianhao Wu
Ao Luo
Rui Huang 0008
Hong Cheng 0002
Yang Zhao 0024

End-to-end deep learning has gained considerable interests in autonomous driving vehicles in both academic and industrial fields, especially in decision making process. One critical issue in decision making process of autonomous driving vehicles is steering control. Researchers has already trained different artificial neural networks to predict steering angle with front-facing camera data stream. However, existing end-to-end methods only consider the spatiotemporal relation on a single layer and lack the ability of extracting future spatiotemporal information. In this paper, we propose an end-to-end driving model based on Convolutional Long Short-Term Memory (Conv-LSTM) neural network with a Multi-scale Spatiotemporal Integration (MSI) module, which aiming to encode the spatiotemporal information from different scales for steering angle prediction. Moreover, we employ future sequential information to enhance spatiotemporal features of the end-to-end driving model. We demonstrate the efficiency of proposed end-to-end driving model on the public Udacity dataset with comparison of some existing methods. Experimental results show that the proposed model has better performances than other existing methods, especially in some complex scenarios. Furthermore, we evaluate the proposed driving model on a real-time autonomous vehicle, and results show that the proposed driving model is able to predict the steering angle with high accuracy compared to skilled human driver.

Details