Arrow Research search

Author name cluster

Teng Fu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

AAAI Conference 2026 Conference Paper

From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation

  • Ke Niu
  • Haiyang Yu
  • Zhuofan Chen
  • Mengyang Zhao
  • Teng Fu
  • Bin Li
  • Xiangyang Xue

Computer-Aided Design (CAD) plays a vital role in engineering and manufacturing, yet current CAD workflows require extensive domain expertise and manual modeling effort. Recent advances in large language models (LLMs) have made it possible to generate code from natural language, opening new opportunities for automating parametric 3D modeling. However, directly translating human design intent into executable CAD code remains highly challenging, due to the need for logical reasoning, syntactic correctness, and numerical precision. In this work, we propose CAD-RL, a multimodal Chain-of-Thought (CoT) guided reinforcement learning post training framework for CAD modeling code generation. Our method combines CoT-based Cold Start with goal-driven reinforcement learning post training using three task-specific rewards: executability reward, geometric accuracy reward, and external evaluation reward. To ensure stable policy learning under sparse and high-variance reward conditions, we introduce three targeted optimization strategies: Trust Region Stretch for improved exploration, Precision Token Loss for enhanced dimensions parameter accuracy, and Overlong Filtering to reduce noisy supervision. To support training and benchmarking, we release ExeCAD, a noval dataset comprising 16,540 real-world CAD examples with paired natural language and structured design language descriptions, executable CADQuery scripts, and rendered 3D models. Experiments demonstrate that CAD-RL achieves significant improvements in reasoning quality, output precision, and code executability over existing VLMs.

AAAI Conference 2026 Conference Paper

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement

  • Weitao Jia
  • Jinghui Lu
  • Haiyang Yu
  • Siqi Wang
  • Guozhi Tang
  • An-Lan Wang
  • Weijie Yin
  • Dingkang Yang

Recent advances demonstrate that reinforcement learning with verifiable rewards (RLVR) significantly enhances the reasoning capabilities of large language models (LLMs). However, standard RLVR faces challenges with reward sparsity, where zero rewards from consistently incorrect candidate answers provide no learning signal, particularly in challenging tasks. To address this,we propose Multi-Expert Mutual Learning GRPO (MEML-GRPO), an innovative framework that utilizes diverse expert prompts as system prompts to generate a broader range of responses, substantially increasing the likelihood of identifying correct solutions. Additionally, we introduce an inter-expert mutual learning mechanism that facilitates knowledge sharing and transfer among experts, further boosting the model’s performance through RLVR. Extensive experiments across multiple reasoning benchmarks show that MEML-GRPO delivers significant improvements, achieving an average performance gain of 4.89% with Qwen and 11.33% with Llama, effectively overcoming the core limitations of traditional RLVR methods.

AAAI Conference 2026 Conference Paper

OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding

  • Teng Fu
  • Mengyang Zhao
  • Ke Niu
  • Kaixin Peng
  • Bin Li

LVLMs have been shown to perform excellently in image-level tasks such as VQA and caption. However, in many instance-level tasks, such as visual grounding and object detection, LVLMs still show performance gaps compared to previous expert models. Meanwhile, although pedestrian tracking is a classical task, there have been a number of new topics in combining object tracking and natural language, such as Referring MOT, Cross-view Referring MOT, and Semantic MOT. These tasks emphasize that models should understand the tracked object at an advanced semantic level, which is exactly where LVLMs excel. In this paper, we propose a new unified Pedestrian Tracking framework, namely OmniPT, which can track, track based on reference and generate semantic understanding of tracked objects interactively. We address two issues: how to model the tracking task into a task that foundation models can perform, and how to make the model output formatted answers. To this end, we implement a training phase consisting of RL-Mid Training-SFT-RL. Based on the pre-trained weights of the LVLM, we first perform a simple RL phase to enable the model to output fixed and supervisable bounding box format. Subsequently, we conduct a mid-training phase using a large number of pedestrian-related datasets. Finally, we perform supervised fine-tuning on several pedestrian tracking datasets, and then carry out another RL phase to improve the model's tracking performance and enhance its ability to follow instructions. We conduct experiments on tracking benchmarks and the experimental results demonstrate that the proposed method can perform better than the previous methods.

NeurIPS Conference 2025 Conference Paper

CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

  • Ke Niu
  • Zhuofan Chen
  • Haiyang Yu
  • Yuwen Chen
  • Teng Fu
  • Mengyang Zhao
  • Bin Li
  • Xiangyang Xue

Computer-Aided Design (CAD) is pivotal in industrial manufacturing, with orthographic projection reasoning foundational to its entire workflow—encompassing design, manufacturing, and simulation. However, prevailing deep-learning approaches employ standard 3D reconstruction pipelines as an alternative, which often introduce imprecise dimensions and limit the parametric editability required for CAD workflows. Recently, some researchers adopt vision–language models (VLMs), particularly supervised fine-tuning (SFT), to tackle CAD-related challenges. SFT shows promise but often devolves into pattern memorization, resulting in poor out-of-distribution (OOD) performance on complex reasoning tasks. To tackle these limitations, we introduce CReFT-CAD, a two-stage fine-tuning paradigm: first, a curriculum-driven reinforcement learning stage with difficulty-aware rewards to steadily build reasoning abilities; second, supervised post-tuning to refine instruction following and semantic extraction. Complementing this, we release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning, comprising 200, 000 synthetic and 3, 000 real-world orthographic projections with precise dimensional annotations and six interoperable data modalities. Benchmarking leading VLMs on orthographic projection reasoning, we show that CReFT-CAD significantly improves reasoning accuracy and OOD generalizability in real-world scenarios, providing valuable insights to advance CAD reasoning research. The code and adopted datasets are available at \url{https: //github. com/KeNiu042/CReFT-CAD}.

AAAI Conference 2025 Conference Paper

Foundation Model Driven Appearance Extraction for Robust Multiple Object Tracking

  • Teng Fu
  • Haiyang Yu
  • Ke Niu
  • Bin Li
  • Xiangyang Xue

Multiple Object Tracking (MOT) is a fundamental task in computer vision. Existing methods utilize motion information or appearance information to perform object tracking. However, these algorithms still struggle with special circumstances, such as occlusion and blurring in complex scenes. Inspired by the fact that people can pinpoint objects through verbal descriptions, we explore performing long-term robust tracking using semantic features of objects. Motivated by the success of the multimodal foundation model in text-image alignment, we reconsider the appearance feature extraction module in MOT and propose a Foundation model Driven multi-object tracker (FDTracker). Specifically, we propose a two-stage trained appearance feature extractor. In the first stage, using a single image of the object as input, the model could capture the attributes of objects with the assistance of natural language instructions. In the second stage, using a sequence of images of objects as input, the model learns how to use these attributes to distinguish between different objects and connect the same object at different times. Finally, for coordinating appearance and motion information, we propose a reasonable combined strategy, which better facilitates trajectory assignment and reconnection. Extensive experiments on benchmarks demonstrate the robustness of FDTracker.

TIST Journal 2021 Journal Article

Modeling Customer Experience in a Contact Center through Process Log Mining

  • Teng Fu
  • Guido Zampieri
  • David Hodgson
  • Claudio Angione
  • Yifeng Zeng

The use of data mining and modeling methods in service industry is a promising avenue for optimizing current processes in a targeted manner, ultimately reducing costs and improving customer experience. However, the introduction of such tools in already established pipelines often must adapt to the way data is sampled and to its content. In this study, we tackle the challenge of characterizing and predicting customer experience having available only process log data with time-stamp information, without any ground truth feedback from the customers. As a case study, we consider the context of a contact center managed by TeleWare and analyze phone call logs relative to a two months span. We develop an approach to interpret the phone call process events registered in the logs and infer concrete points of improvement in the service management. Our approach is based on latent tree modeling and multi-class Naïve Bayes classification, which jointly allow us to infer a spectrum of customer experiences and test their predictability based on the current data sampling strategy. Moreover, such approach can overcome limitations in customer feedback collection and sharing across organizations, thus having wide applicability and being complementary to tools relying on more heavily constrained data.