TIST Journal 2026 Journal Article
LKAFormer: A Lightweight Kolmogorov-Arnold Transformer Model for Image Semantic Segmentation
- Shoulin Yin
- Liguo Wang
- Tao Chen
- Huafei Huang
- Jing Gao
- Jianing Zhang
- Meng Liu
- Peng Li
Transformer-based semantic segmentation methods have demonstrated outstanding performance by leveraging global self-attention to effectively capture long-range dependence. However, there still exist two issues in existing works: (1) Most of them utilize the full-rank weight matrix to support the self-attention mechanism and feed-forward network in modelling long-range dependence between patches/pixels, resulting in a high computational cost during both training and inference. (2) Most of them ignore information interactions between high-level semantics and low-level structures during the image resolution recovery, which leads to the performance degradation in segmenting objects with complex boundaries. To tackle these challenges, a lightweight Kolmogorov-Arnold Transformer model (LKAFormer) is proposed for the image semantic segmentation, containing a two-stream lightweight Transformer encoder and a graph feature pyramid aggregation KAN-decoder. The former constructs a hierarchical feature cross-scale fusion pipeline to obtain sufficient semantics containing comprehensive multi-scale information via setting coarse-grained and fine-grained streams with different-size patches of images. In that pipeline, feature lightweight focusing modules model complex and long-range dependence across patches/pixels to refine image semantics with less computational costs by lightweight multi-head self-attention and lightweight feed-forward network designs. The latter leverages the learnable nonlinear transformation mechanism of the Kolmogorov-Arnold Transformer architecture to adaptively capture spatial structure dependence of distinct sub-regions of images. And then, it jointly performs the intra-scale graph fusion and cross-scale graph fusion during the image resolution recovery to enhance information interactions between high-level semantics and low-level structures, which achieves the robust boundary localization and texture refinement of segmentation objects. Finally, plentiful experiments are conducted on three challenging datasets, and the results show LKAFormer sets a new baseline in the image segmentation task in comparison with 11 methods.