Author name cluster

Yike Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

1 author row

AAAI Conference 2026 Conference Paper

IQGS: Instance Query-based Gaussian Segmentation

Yichao Gao
Xinyuan Liu
Yike Ma
Yucheng Zhang
Feng Dai

In recent years, Gaussian scene representations have achieved a series of promising results in 3D reconstruction. Compared to the previous 3DGS paradigm, the latest reconstruction approach 2DGS can achieve more accurate geometric representation using fewer Gaussian points. Accordingly, developing a panoramic segmentation algorithm suitable for 2DGS-reconstructed scenes is of significant importance. However, existing segmentation methods are primarily designed for 3DGS. They either fail to account for all objects in complex segmentation scenes or suffer from significant performance degradation when applied to 2D Gaussian scenes. Moreover, these methods consistently exhibit poor cross-dataset generalization. To address these issues, we propose IQGS, a segmentation framework applicable to 2DGS representations. Specifically, IQGS employs per-instance query and relaxed object-level supervision instead of strict pixel-level ID supervision, effectively mitigating the segmentation performance degradation that occurs when applied to 2DGS. At the same time, by learning features independent of specific object ID assignments, IQGS enhances its ability to generalize across diverse datasets. Our method achieves impressive panoramic segmentation results across multiple datasets, with an average mIoU of 66.6%, surpassing the state-of-the-art method Gaussian Grouping, which achieves 57.17%.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving

Yanping Fu
Xinyuan Liu
Tianyu Li
Yike Ma
Yucheng Zhang
Feng Dai

Topology reasoning, which unifies perception and structured reasoning, plays a vital role in understanding intersections for autonomous driving. However, its performance heavily relies on the accuracy of lane detection, particularly at connected lane endpoints. Existing methods often suffer from lane endpoints deviation, leading to incorrect topology construction. To address this issue, we propose TopoPoint, a novel framework that explicitly detects lane endpoints and jointly reasons over endpoints and lanes for robust topology reasoning. During training, we independently initialize point and lane query, and proposed Point-Lane Merge Self-Attention to enhance global context sharing through incorporating geometric distances between points and lanes as an attention mask. We further design Point-Lane Graph Convolutional Network to enable mutual feature aggregation between point and lane query. During inference, we introduce Point-Lane Geometry Matching algorithm that computes distances between detected points and lanes to refine lane endpoints, effectively mitigating endpoint deviation. Extensive experiments on the OpenLane-V2 benchmark demonstrate that TopoPoint achieves state-of-the-art performance in topology reasoning (48. 8 on OLS). Additionally, we propose DET$_p$ to evaluate endpoint detection, under which our method significantly outperforms existing approaches (52. 6 v. s. 45. 2 on DET$_p$). The code is released at https: //github. com/Franpin/TopoPoint.

PDF Details

IJCAI Conference 2024 Conference Paper

MISA: MIning Saliency-Aware Semantic Prior for Box Supervised Instance Segmentation

Hao Zhu
Yan Zhu
Jiayu Xiao
Yike Ma
Yucheng Zhang
Jintao Li
Feng Dai

Box supervised instance segmentation (BSIS) aims to achieve an effective trade-off between annotation costs and model performance by solely relying on bounding box annotations during training process. However, we observe that BSIS model is bottlenecked by the intricate objective under limited guidance, and tends to sacrifice segmentation capability in order to effectively recognize multiple instances. To boost the BSIS model's perceptual ability for object shape and contour, we introduce MISA, that is, MIning Saliency-Aware semantic prior from a well-optimized box supervised semantic segmentation (BSSS) network, and incorporating cross-model guidance into the learning process of BSIS. Specifically, we first design a Frequency-Space Distillation (FSD) module to extract assorted salient prior knowledge from BSSS model, and perform cross-model alignment for transfering the prior to BSIS model. Furthermore, we introduce Semantic-Enhanced Pairwise Affinity (SEPA), which borrows the object perceptual ability of BSSS model to emphasize the contribution of salient objects for pairwise affinity, providing more accurate guidance for the BSIS network. Extensive experiments show that our proposed MISA consistently surpasses the existing state-of-the-art methods by a large margin in the BSIS scenario.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes

Yanping Fu
Wenbin Liao
Xinyuan Liu
Hang Xu
Yike Ma
Yucheng Zhang
Feng Dai

As an emerging task that integrates perception and reasoning, topology reasoning in autonomous driving scenes has recently garnered widespread attention. However, existing work often emphasizes "perception over reasoning": they typically boost reasoning performance by enhancing the perception of lanes and directly adopt vanilla MLPs to learn lane topology from lane query. This paradigm overlooks the geometric features intrinsic to the lanes themselves and are prone to being influenced by inherent endpoint shifts in lane detection. To tackle this issue, we propose an interpretable method for lane topology reasoning based on lane geometric distance and lane query similarity, named TopoLogic. This method mitigates the impact of endpoint shifts in geometric space, and introduces explicit similarity calculation in semantic space as a complement. By integrating results from both spaces, our methods provides more comprehensive information for lane topology. Ultimately, our approach significantly outperforms the existing state-of-the-art methods on the mainstream benchmark OpenLane-V2 (23. 9 v. s. 10. 9 in TOP$_{ll}$ and 44. 1 v. s. 39. 8 in OLS on subsetA). Additionally, our proposed geometric distance topology reasoning method can be incorporated into well-trained models without re-training, significantly enhancing the performance of lane topology reasoning. The code is released at https: //github. com/Franpin/TopoLogic.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods

Xinyuan Liu
Hang Xu
Bin Chen
Qiang Zhao
Yike Ma
Chenggang Yan
Feng Dai

Object detection on panoramic/spherical images has been developed rapidly in the past few years, where IoU-calculator is a fundamental part of various detector components, i. e. Label Assignment, Loss and NMS. Due to the low efficiency and non-differentiability of spherical Unbiased IoU, spherical approximate IoU methods have been proposed recently. We find that the key of these approximate methods is to map spherical boxes to planar boxes. However, there exists two problems in these methods: (1) they do not eliminate the influence of panoramic image distortion; (2) they break the original pose between bounding boxes. They lead to the low accuracy of these methods. Taking the two problems into account, we propose a new sphere-plane boxes transform, called Sph2Pob. Based on the Sph2Pob, we propose (1) an differentiable IoU, Sph2Pob-IoU, for spherical boxes with low time-cost and high accuracy and (2) an agent Loss, Sph2Pob-Loss, for spherical detection with high flexibility and expansibility. Extensive experiments verify the effectiveness and generality of our approaches, and Sph2Pob-IoU and Sph2Pob-Loss together boost the performance of spherical detectors. The source code is available at https: //github. com/AntXinyuan/sph2pob.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Unbiased IoU for Spherical Image Object Detection

Feng Dai
Bin Chen
Hang Xu
Yike Ma
Xiaodong Li
Bailan Feng
Peng Yuan
Chenggang Yan

As one of the fundamental components of object detection, intersection-over-union (IoU) calculations between two bounding boxes play an important role in samples selection, NMS operation and evaluation of object detection algorithms. This procedure is well-defined and solved for planar images, while it is challenging for spherical ones. Some existing methods utilize planar bounding boxes to represent spherical objects. However, they are biased due to the distortions of spherical objects. Others use spherical rectangles as unbiased representations, but they adopt excessive approximate algorithms when computing the IoU. In this paper, we propose an unbiased IoU as a novel evaluation criterion for spherical image object detection, which is based on the unbiased representations and utilize unbiased analytical method for IoU calculation. This is the first time that the absolutely accurate IoU calculation is applied to the evaluation criterion, thus object detection algorithms can be correctly evaluated for spherical images. With the unbiased representation and calculation, we also present Spherical CenterNet, an anchor free object detection algorithm for spherical images. The experiments show that our unbiased IoU gives accurate results and the proposed Spherical CenterNet achieves better performance on one real-world and two synthetic spherical object detection datasets than existing methods.

PDF Details

IJCAI Conference 2021 Conference Paper

Bipartite Matching for Crowd Counting with Point Supervision

Hao Liu
Qiang Zhao
Yike Ma
Feng Dai

For crowd counting task, it has been demonstrated that imposing Gaussians to point annotations hurts generalization performance. Several methods attempt to utilize point annotations as supervision directly. And they have made significant improvement compared with density-map based methods. However, these point based methods ignore the inevitable annotation noises and still suffer from low robustness to noisy annotations. To address the problem, we propose a bipartite matching based method for crowd counting with only point supervision (BM-Count). In BM-Count, we select a subset of most similar pixels from the predicted density map to match annotated pixels via bipartite matching. Then loss functions can be defined based on the matching pairs to alleviate the bad effect caused by those annotated dots with incorrect positions. Under the noisy annotations, our method reduces MAE and RMSE by 9% and 11. 2% respectively. Moreover, we propose a novel ranking distribution learning framework to address the imbalanced distribution problem of head counts, which encodes the head counts as classification distribution in the ranking domain and refines the estimated count map in the continuous domain. Extensive experiments on four datasets show that our method achieves state-of-the-art performance and performs better crowd localization.

PDF Details DOI

IJCAI Conference 2018 Conference Paper

Distortion-aware CNNs for Spherical Images

Qiang Zhao
Chen Zhu
Feng Dai
Yike Ma
Guoqing Jin
Yongdong Zhang

Convolutional neural networks are widely used in computer vision applications. Although they have achieved great success, these networks can not be applied to 360 spherical images directly due to varying distortion effect. In this paper, we present distortion-aware convolutional network for spherical images. For each pixel, our network samples a non-regular grid based on its distortion level, and convolves the sampled grid using square kernels shared by all pixels. The network successively approximates large image patches from different tangent planes of viewing sphere with small local sampling grids, thus improves the computational efficiency. Our method also deals with the boundary problem, which is an inherent issue for spherical images. To evaluate our method, we apply our network in spherical image classification problems based on transformed MNIST and CIFAR-10 datasets. Compared with the baseline method, our method can get much better performance. We also analyze the variants of our network.

PDF Details