Arrow Research search

Author name cluster

Ze Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

IROS Conference 2024 Conference Paper

Enhancing Leg Odometry in Legged Robots with Learned Contact Bias: An LSTM Recurrent Neural Network Approach

  • Yaru Gu
  • Ze Liu
  • Ting Zou

To address the leg odometry drift caused by the non-stationary foot contact, this paper introduces a novel data-driven based leg odometry technique for legged robots. By leveraging a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN), the method learns the biases in the robot’s foot contact locations from sequential IMU measurements and ground reaction forces (GRF). This learned contact bias is then incorporated into the state estimation process using a Kalman filter (KF), significantly improving the precision of leg odometry for legged robots operating in real time. This method, which combines deep learning approaches with conventional filtering techniques, is named the Deep Learning Kalman Filter (DLKF). The effectiveness of the DLKF is demonstrated through simulation and experimental trials using a Unitree Go1 robot across various challenging environments, including uneven terrain, slopes, and stairs, where foot slippage occurs frequently. Our results indicate an average 64. 93% reduction in translational errors in leg odometry when the learned contact bias is applied. Further improvements are observed in a fused LiDAR and leg odometry state estimation system, especially in feature-deprived areas, indicating that the proposed leg odometry system can be easily fused with other sensor measurements to get a more precise state estimation.

NeurIPS Conference 2024 Conference Paper

Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure

  • Jin Zhang
  • Ze Liu
  • Defu Lian
  • Enhong Chen

Two-stage recommender systems play a crucial role in efficiently identifying relevant items and personalizing recommendations from a vast array of options. This paper, based on an error decomposition framework, analyzes the generalization error for two-stage recommender systems with a tree structure, which consist of an efficient tree-based retriever and a more precise yet time-consuming ranker. We use the Rademacher complexity to establish the generalization upper bound for various tree-based retrievers using beam search, as well as for different ranker models under a shifted training distribution. Both theoretical insights and practical experiments on real-world datasets indicate that increasing the branches in tree-based retrievers and harmonizing distributions across stages can enhance the generalization performance of two-stage recommender systems.

ICLR Conference 2024 Conference Paper

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

  • Yichao Shen 0001
  • Zigang Geng
  • Yuhui Yuan
  • Yutong Lin
  • Ze Liu
  • Chunyu Wang 0001
  • Han Hu 0001
  • Nanning Zheng 0001

We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address the limitation, we introduce a novel 3D Vertex Relative Position Encoding (3DV-RPE) method which computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, thus providing clear information to guide the model to focus on points near the objects, in accordance with the principle of locality. Furthermore, we have systematically refined our pipeline, including data normalization, to better align with the task requirements. Our approach demonstrates remarkable performance on the demanding ScanNetV2 benchmark, showcasing substantial enhancements over the prior state-of-the-art CAGroup3D. Specifically, we achieve an increase in $AP_{25}$ from $75.1\%$ to $77.8\%$ and in ${AP}_{50}$ from $61.3\%$ to $66.0\%$.

NeurIPS Conference 2022 Conference Paper

Could Giant Pre-trained Image Models Extract Universal Representations?

  • Yutong Lin
  • Ze Liu
  • Zheng Zhang
  • Han Hu
  • Nanning Zheng
  • Stephen Lin
  • Yue Cao

Frozen pretrained models have become a viable alternative to the pretraining-then-finetuning paradigm for transfer learning. However, with frozen models there are relatively few parameters available for adapting to downstream tasks, which is problematic in computer vision where tasks vary significantly in input/output format and the type of information that is of value. In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition. From this empirical analysis, our work answers the questions of what pretraining task fits best with this frozen setting, how to make the frozen setting more flexible to various downstream tasks, and the effect of larger model sizes. We additionally examine the upper bound of performance using a giant frozen pretrained model with 3 billion parameters (SwinV2-G) and find that it reaches competitive performance on a varied set of major benchmarks with only one shared frozen base network: 60. 0 box mAP and 52. 2 mask mAP on COCO object detection test-dev, 57. 6 val mIoU on ADE20K semantic segmentation, and 81. 7 top-1 accuracy on Kinetics-400 action recognition. With this work, we hope to bring greater attention to this promising path of freezing pretrained image models.

AAAI Conference 2021 Conference Paper

RGB-D Salient Object Detection via 3D Convolutional Neural Networks

  • Qian Chen
  • Ze Liu
  • Yi Zhang
  • Keren Fu
  • Qijun Zhao
  • Hongwei Du

RGB-D salient object detection (SOD) recently has attracted increasing research interest and many deep learning methods based on encoder-decoder architectures have emerged. However, most existing RGB-D SOD models conduct feature fusion either in the single encoder or the decoder stage, which hardly guarantees sufficient cross-modal fusion ability. In this paper, we make the first attempt in addressing RGB-D SOD through 3D convolutional neural networks. The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RG- B and depth streams. Specifically, RD3D first conducts pre-fusion across RGB and depth modalities through an inflated 3D encoder, and later provides in-depth feature fusion by designing a 3D decoder equipped with rich back-projection paths (RBPP) for leveraging the extensive aggregation ability of 3D convolutions. With such a progressive fusion strategy involving both the encoder and decoder, effective and thorough interaction between the two modalities can be exploited and boost the detection accuracy. Extensive experiments on six widely used benchmark datasets demonstrate that RD3D performs favorably against 14 state-of-the-art RGB-D SOD approaches in terms of four key evaluation metrics. Our code will be made publicly available: https: //github. com/PPOLYpubki/RD3D.