Arrow Research search
Back to AAAI

AAAI 2026

LC3: Long Cross-Language Code Clone Detection Enhanced by Opcode Sequences and Affinity Aggregation

Conference Paper AAAI Technical Track on Natural Language Processing II Artificial Intelligence

Abstract

Cross-language code clone detection, which identifies functionally similar code across programming languages, is critical for ensuring synchronized evolution and reducing maintenance costs in multi-platform software development. While zero-shot approaches have emerged as a practical solution to data scarcity, state-of-the-art methods still face two major limitations: an insufficiency in learning language-agnostic representations and information loss during the processing of long code. To address these challenges, we propose LC3, a novel framework for robust zero-shot cross-language code clone detection. To overcome the language-agnostic representation insufficiency, LC3 fuses source code with its underlying opcode sequences, leveraging a bimodal architecture and adversarial training to learn a language-agnostic representation. To resolve long-code information loss, LC3 introduces a semantic affinity aggregation strategy. This strategy synthesizes a robust clone score from a complete pairwise similarity matrix computed between segmented code blocks, overcoming the limitations of both simple truncation and aggregation. Extensive experiments show that LC3 significantly outperforms state-of-the-art zero-shot baselines, especially in challenging long-code scenarios.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
452508344458108725