Author name cluster

Youqi Song

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

2 author rows

IROS Conference 2025 Conference Paper

DPSN: Dual Prior Knowledge Induced Tactile paving and Obstacle Joint Segmentation Network

Youqi Song
Wenqi Li
Zhao Zhang
Yu Wu
Zilong Jin
Changbo Wang
Gaoqi He

Accurate semantic segmentation of both tactile paving and the obstacle is crucial for the safe mobility of visually impaired individuals. However, existing methods face two major challenges: (i) discontinuous segmentation fragments; (ii) Inaccurate obstacle recognition. To address challenge (i), we propose incorporating appearance priors of complete tactile pavings to prevent the model from directly learning irregular ground truth masks. To tackle challenge (ii), we propose introducing cross-modal semantic priors to complement the semantic information of obstacles. We implemented these strategies in proposed Dual Prior knowledge induced tactile paving and obstacle joint Segmentation Network (DPSN). Based on bilateral network architecture, DPSN merges obstacle category masks into tactile paving categories, constructing a complete tactile paving mask. Utilizing the complete mask, DPSN transfer appearance prior knowledge to detail features from boundary and structural perspectives. Concurrently, DPSN leverages the CLIP Text Encoder to guide visual feature decoding by attention mechanisms, transferring rich cross-modal semantic prior knowledge to the visual feature maps. Furthermore, we propose the TPO-Dataset, the first dataset for joint tactile paving and obstacle segmentation acquired from actual scenes. Experiments demonstrate that DPSN achieves state-of-the-art results on the TPO-Dataset, with relative gains of 27. 16% in obstacle IoU and 30. 53% in accuracy metrics compared to baseline methods. Notably, DPSN achieves real-time performance at 88. 25 FPS on the maximum scale of 2048×512 resolution.

Details

AAAI Conference 2024 Conference Paper

Kumaraswamy Wavelet for Heterophilic Scene Graph Generation

Lianggangxu Chen
Youqi Song
Shaohui Lin
Changbo Wang
Gaoqi He

Graph neural networks (GNNs) has demonstrated its capabilities in the field of scene graph generation (SGG) by updating node representations from neighboring nodes. Actually it can be viewed as a form of low-pass filter in the spatial domain, which smooths node feature representation and retains commonalities among nodes. However, spatial GNNs does not work well in the case of heterophilic SGG in which fine-grained predicates are always connected to a large number of coarse-grained predicates. Blind smoothing undermines the discriminative information of the fine-grained predicates, resulting in failure to predict them accurately. To address the heterophily, our key idea is to design tailored filters by wavelet transform from the spectral domain. First, we prove rigorously that when the heterophily on the scene graph increases, the spectral energy gradually shifts towards the high-frequency part. Inspired by this observation, we subsequently propose the Kumaraswamy Wavelet Graph Neural Network (KWGNN). KWGNN leverages complementary multi-group Kumaraswamy wavelets to cover all frequency bands. Finally, KWGNN adaptively generates band-pass filters and then integrates the filtering results to better accommodate varying levels of smoothness on the graph. Comprehensive experiments on the Visual Genome and Open Images datasets show that our method achieves state-of-the-art performance.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation

Lianggangxu Chen
Youqi Song
Yiqing Cai
Jiale Lu
Yang Li
Yuan Xie
Changbo Wang
Gaoqi He

In the domain of scene graph generation, modeling commonsense as a single-prototype representation has been typically employed to facilitate the recognition of infrequent predicates. However, a fundamental challenge lies in the large intra-class variations of the visual appearance of predicates, resulting in subclasses within a predicate class. Such a challenge typically leads to the problem of misclassifying diverse predicates due to the rough predicate space clustering. In this paper, inspired by cognitive science, we maintain multi-prototype representations for each predicate class, which can accurately find the multiple class centers of the predicate space. Technically, we propose a novel multi-prototype learning framework consisting of three main steps: prototype-predicate matching, prototype updating, and prototype space optimization. We first design a triple-level optimal transport to match each predicate feature within the same class to a specific prototype. In addition, the prototypes are updated using momentum updating to find the class centers according to the matching results. Finally, we enhance the inter-class separability of the prototype space through iterations of the inter-class separability loss and intra-class compactness loss. Extensive evaluations demonstrate that our approach significantly outperforms state-of-the-art methods on the Visual Genome dataset.

PDF Details DOI