Arrow Research search

Author name cluster

Tong Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

AAAI Conference 2025 Conference Paper

Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

  • Tong Ye
  • Yangkai Du
  • Tengfei Ma
  • Lingfei Wu
  • Xuhong Zhang
  • Shouling Ji
  • Wenhai Wang

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating code. However, the misuse of LLM-generated (synthetic) code has raised concerns in both educational and industrial contexts, underscoring the urgent need for synthetic code detectors. Existing methods for detecting synthetic content are primarily designed for general text and struggle with code due to the unique grammatical structure of programming languages and the presence of numerous ``low-entropy'' tokens. Building on this, our work proposes a novel zero-shot synthetic code detector based on the similarity between the original code and its LLM-rewritten variants. Our method is based on the observation that differences between LLM-rewritten and original code tend to be smaller when the original code is synthetic. We utilize self-supervised contrastive learning to train a code similarity model and evaluate our approach on two synthetic code detection benchmarks. Our results demonstrate a significant improvement over existing SOTA synthetic content detectors, delivering notable gains in both performance and robustness on the APPS and MBPP benchmarks.

AAAI Conference 2023 Conference Paper

On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models

  • Tong Ye
  • Shijing Si
  • Jianzong Wang
  • Ning Cheng
  • Zhitao Li
  • Jing Xiao

Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging. Most dialog response retrieval models output a single score for a response on how relevant it is to a given question. However, the bad calibration of deep neural network results in various uncertainty for the single score such that the unreliable predictions always misinform user decisions. To investigate these issues, we present an efficient calibration and uncertainty estimation framework PG-DRR for dialog response retrieval models which adds a Gaussian Process layer to a deterministic deep neural network and recovers conjugacy for tractable posterior inference by Pólya-Gamma augmentation. Finally, PG-DRR achieves the lowest empirical calibration error (ECE) in the in-domain datasets and the distributional shift task while keeping R10@1 and MAP performance.