Arrow Research search

Author name cluster

Jing Xiong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

TMLR Journal 2026 Journal Article

A Survey on Federated Fine-Tuning of Large Language Models

  • Yebo Wu
  • Chunlin Tian
  • Jingguang Li
  • He Sun
  • KaHou Tam
  • Zhanting Zhou
  • Haicheng Liao
  • Jing Xiong

Large Language Models (LLMs) have demonstrated impressive success across various tasks. Integrating LLMs with Federated Learning (FL), a paradigm known as FedLLM, offers a promising avenue for collaborative model adaptation while preserving data privacy. This survey provides a systematic and comprehensive review of FedLLM. We begin by tracing the historical development of both LLMs and FL, summarizing relevant prior research to set the context. Subsequently, we delve into an in-depth analysis of the fundamental challenges inherent in deploying FedLLM. Addressing these challenges often requires efficient adaptation strategies; therefore, we conduct an extensive examination of existing Parameter-Efficient Fine-tuning (PEFT) methods and explore their applicability within the FL framework. To rigorously evaluate the performance of FedLLM, we undertake a thorough review of existing fine-tuning datasets and evaluation benchmarks. Furthermore, we discuss FedLLM's diverse real-world applications across multiple domains. Finally, we identify critical open challenges and outline promising research directions to foster future advancements in FedLLM. This survey aims to serve as a foundational resource for researchers and practitioners, offering valuable insights into the rapidly evolving landscape of federated fine-tuning for LLMs. It also establishes a roadmap for future innovations in privacy-preserving AI. We actively maintain a GitHub repo to track cutting-edge advancements in this field.

AAAI Conference 2026 Conference Paper

Emotion and Intention Guided Multi-Modal Learning for Sticker Response Selection

  • Yuxuan Hu
  • Jian Chen
  • Yuhao Wang
  • Zixuan Li
  • Jing Xiong
  • Pengyue Jia
  • Wei Wang
  • Chengming Li

Stickers are widely used in online communication to convey emotions and implicit intentions. The Sticker Response Selection (SRS) task aims to select the most contextually appropriate sticker based on the dialogue. However, existing methods typically rely on semantic matching and model emotional and intentional cues separately, which can lead to mismatches when emotions and intentions are misaligned. To address this issue, we propose Emotion and Intention Guided Multi-Modal Learning (EIGML). This framework is the first to jointly model emotion and intention, effectively reducing the bias caused by isolated modeling and significantly improving selection accuracy. Specifically, we introduce Dual-Level Contrastive Framework to perform both intra-modality and inter-modality alignment, ensuring consistent representation of emotional and intentional features within and across modalities. In addition, we design an Intention-Emotion Guided Multi-Modal Fusion module that integrates emotional and intentional information progressively through three components: Emotion-Guided Intention Knowledge Selection, Intention-Emotion Guided Attention Fusion, and Similarity-Adjusted Matching Mechanism. This design injects rich, effective information into the model and enables a deeper understanding of the dialogue, ultimately enhancing sticker selection performance. Experimental results on two public datasets show that EIGML outperforms state-of-the-art baselines, achieving higher accuracy and a better understanding of emotional and intentional features.

TMLR Journal 2025 Journal Article

Autoregressive Models in Vision: A Survey

  • Jing Xiong
  • Gongye Liu
  • Lun Huang
  • Chengyue Wu
  • Taiqiang Wu
  • Yao Mu
  • Yuan Yao
  • Hui Shen

Autoregressive modeling has been a huge success in the field of natural language processing (NLP). Recently, autoregressive models have emerged as a significant area of focus in computer vision, where they excel in producing high-quality visual content. Autoregressive models in NLP typically operate on subword tokens. However, the representation strategy in computer vision can vary in different levels, i.e., pixel-level, token-level, or scale-level, reflecting the diverse and hierarchical nature of visual data compared to the sequential structure of language. This survey comprehensively examines the literature on autoregressive models applied to vision. To improve readability for researchers from diverse research backgrounds, we start with preliminary sequence representation and modeling in vision. Next, we divide the fundamental frameworks of visual autoregressive models into three general sub-categories, including pixel-based, token-based, and scale-based models based on the representation strategy. We then explore the interconnections between autoregressive models and other generative models. Furthermore, we present a multifaceted categorization of autoregressive models in computer vision, including image generation, video generation, 3D generation, and multimodal generation. We also elaborate on their applications in diverse domains, including emerging domains such as embodied AI and 3D medical AI, with about 250 related references. Finally, we highlight the current challenges to autoregressive models in vision with suggestions about potential research directions. We have also set up a Github repository to organize the papers included in this survey at: https://github.com/ChaofanTao/Autoregressive-Models-in-Vision-Survey.

ICLR Conference 2025 Conference Paper

D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

  • Zhongwei Wan
  • Xinjian Wu
  • Yu Zhang 0133
  • Yi Xin 0003
  • Chaofan Tao
  • Zhihong Zhu
  • Xin Wang 0120
  • Siqi Luo

Efficient generative inference in Large Language Models (LLMs) is impeded by the growing memory demands of Key-Value (KV) cache, especially for longer sequences. Traditional KV Cache eviction strategies, which discard less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce **D**ynamic **D**iscriminative **O**perations ($\mathbf{D_2 O}$), a novel method that optimizes KV cache size dynamically and discriminatively at two levels without fine-tuning, while preserving essential context. At **layer-level**, by observing the varying densities of attention weights between shallow and deep layers, we dynamically determine which layers should avoid excessive eviction via our proposed ***dynamic allocation strategy*** to minimize information loss. At **token-level**, for the eviction strategy in each layer, $\mathbf{D_2 O}$ innovatively incorporates a ***compensation mechanism*** that maintains a similarity threshold to re-discriminate the importance of currently discarded tokens, determining whether they should be recalled and merged with similar tokens. Extensive experiments on various benchmarks and LLM architectures have shown that $\mathbf{D_2 O}$ not only achieves significant memory savings and enhances inference throughput by more than 3$\times$ but also maintains high-quality long-text generation.

ICLR Conference 2025 Conference Paper

FormalAlign: Automated Alignment Evaluation for Autoformalization

  • Jianqiao Lu
  • Yingjia Wan
  • Yinya Huang
  • Jing Xiong
  • Zhengying Liu
  • Zhijiang Guo

Autoformalization aims to convert informal mathematical proofs into machine-verifiable formats, bridging the gap between natural and formal languages. However, ensuring semantic alignment between the informal and formalized statements remains challenging. Existing approaches heavily rely on manual verification, hindering scalability. To address this, we introduce FormalAlign, a framework for automatically evaluating the alignment between natural and formal languages in autoformalization. FormalAlign trains on both the autoformalization sequence generation task and the representational alignment between input and output, employing a dual loss that combines a pair of mutually enhancing autoformalization and alignment tasks. Evaluated across four benchmarks augmented by our proposed misalignment strategies, FormalAlign demonstrates superior performance. In our experiments, FormalAlign outperforms GPT-4, achieving an Alignment-Selection Score 11.58\% higher on \forml-Basic (99.21\% vs. 88.91\%) and 3.19\% higher on MiniF2F-Valid (66.39\% vs. 64.34\%). This effective alignment evaluation significantly reduces the need for manual verification.

NeurIPS Conference 2025 Conference Paper

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

  • Tianxu Wang
  • Zhuofan Zhang
  • Ziyu Zhu
  • Yue Fan
  • Jing Xiong
  • Pengxiang Li
  • Xiaojian (Shawn) Ma
  • Qing Li

3D visual grounding has made notable progress in localizing objects within complex 3D scenes. However, grounding referring expressions beyond objects in 3D scenes remains unexplored. In this paper, we introduce Anywhere3D-Bench, a holistic 3D visual grounding benchmark consisting of 2, 886 referring expression-3D bounding box pairs spanning four different grounding levels: human-activity areas, unoccupied space beyond objects, individual objects in the scene, and fine-grained object parts. We assess a range of state-of-the-art 3D visual grounding methods alongside large language models (LLMs) and multimodal LLMs (MLLMs) on Anywhere3D-Bench. Experimental results reveal that space-level and part-level visual grounding pose the greatest challenges: space-level tasks require a more comprehensive spatial reasoning ability, for example, modeling distances and spatial relations within 3D space, while part-level tasks demand fine-grained perception of object composition. Even the best performance model, OpenAI o4-mini, achieves only 23. 00% accuracy on space-level tasks and 31. 46% on part-level tasks, significantly lower than its performance on area-level and object-level tasks. These findings underscore a critical gap in current models’ capacity to understand and reason about 3D scenes beyond object-level semantics.

ICML Conference 2025 Conference Paper

ParallelComp: Parallel Long-Context Compressor for Length Extrapolation

  • Jing Xiong
  • Jianghan Shen
  • Chuanyang Zheng
  • Zhongwei Wan
  • Chenyang Zhao
  • Chiwun Yang
  • Fanghua Ye 0001
  • Hongxia Yang

Extrapolating ultra-long contexts (text length $>$128K) remains a major challenge for large language models (LLMs), as most training-free extrapolation methods are not only severely limited by memory bottlenecks, but also suffer from the attention sink, which restricts their scalability and effectiveness in practice. In this work, we propose ParallelComp, a parallel long-context compression method that effectively overcomes the memory bottleneck, enabling 8B-parameter LLMs to extrapolate from 8K to 128K tokens on a single A100 80GB GPU in a training-free setting. ParallelComp splits the input into chunks, dynamically evicting redundant chunks and irrelevant tokens, supported by a parallel KV cache eviction mechanism. Importantly, we present a systematic theoretical and empirical analysis of attention biases in parallel attention—including the attention sink, recency bias, and middle bias—and reveal that these biases exhibit distinctive patterns under ultra-long context settings. We further design a KV cache eviction technique to mitigate this phenomenon. Experimental results show that ParallelComp enables an 8B model (trained on 8K context) to achieve 91. 17% of GPT-4’s performance under ultra-long contexts, outperforming closed-source models such as Claude-2 and Kimi-Chat. We achieve a 1. 76x improvement in chunk throughput, thereby achieving a 23. 50x acceleration in the prefill stage with negligible performance loss and pave the way for scalable and robust ultra-long contexts extrapolation in LLMs. We release the code at https: //github. com/menik1126/ParallelComp.

NeurIPS Conference 2025 Conference Paper

SAS: Simulated Attention Score

  • Chuanyang Zheng
  • Jiankai Sun
  • Yihang Gao
  • Yuehao Wang
  • Peihao Wang
  • Jing Xiong
  • Liliang Ren
  • Hao Cheng

The attention mechanism is a core component of the Transformer architecture. Various methods have been developed to compute attention scores, including multi-head attention (MHA), multi-query attention, group-query attention and so on. We further analyze the MHA and observe that its performance improves as the number of attention heads increases, provided the hidden size per head remains sufficiently large. Therefore, increasing both the head count and hidden size per head with minimal parameter overhead can lead to significant performance gains at a low cost. Motivated by this insight, we introduce Simulated Attention Score (SAS), which maintains a compact model size while simulating a larger number of attention heads and hidden feature dimension per head. This is achieved by projecting a low-dimensional head representation into a higher-dimensional space, effectively increasing attention capacity without increasing parameter count. Beyond the head representations, we further extend the simulation approach to feature dimension of the key and query embeddings, enhancing expressiveness by mimicking the behavior of a larger model while preserving the original model size. To control the parameter cost, we also propose Parameter-Efficient Attention Aggregation (PEAA). Comprehensive experiments on a variety of datasets and tasks demonstrate the effectiveness of the proposed SAS method, achieving significant improvements over different attention variants.

ICML Conference 2025 Conference Paper

SkipGPT: Each Token is One of a Kind

  • Anhao Zhao
  • Fanghua Ye 0001
  • Yingqi Fan
  • Junlong Tong
  • Jing Xiong
  • Zhiwei Fei
  • Hui Su
  • Xiaoyu Shen 0001

Large language models (LLMs) achieve remarkable performance across tasks but incur substantial computational costs due to their deep, multi-layered architectures. Layer pruning has emerged as a strategy to alleviate these inefficiencies, but conventional static pruning methods overlook two critical dynamics inherent to LLM inference: (1) horizontal dynamics, where token-level heterogeneity demands context-aware pruning decisions, and (2) vertical dynamics, where the distinct functional roles of MLP and self-attention layers necessitate component-specific pruning policies. We introduce SkipGPT, a dynamic layer pruning framework designed to optimize computational resource allocation through two core innovations: (1) global token-aware routing to prioritize critical tokens and (2) decoupled pruning policies for MLP and self-attention components. To mitigate training instability, we propose a two-stage optimization paradigm: first, a disentangled training phase that learns routing strategies via soft parameterization to avoid premature pruning decisions, followed by parameter-efficient LoRA fine-tuning to restore performance impacted by layer removal. Extensive experiments demonstrate that SkipGPT reduces over 40% model parameters while matching or exceeding the performance of the original dense model across benchmarks. By harmonizing dynamic efficiency with preserved expressivity, SkipGPT advances the practical deployment of scalable, resource-aware LLMs. Our code is publicly available at: https: //github. com/EIT-NLP/SkipGPT.

NeurIPS Conference 2025 Conference Paper

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

  • Zhongwei Wan
  • Zhihao Dou
  • Che Liu
  • Yu Zhang
  • Dongfei Cui
  • Qinjian Zhao
  • Hui Shen
  • Jing Xiong

Multimodal large language models (MLLMs) have shown promising capabilities in reasoning tasks, yet still struggle significantly with complex problems requiring explicit self-reflection and self-correction, especially compared to their unimodal text-based counterparts. Existing reflection methods are simplistic and struggle to generate meaningful, instructive feedback, as the reasoning ability and knowledge limits of pre-trained models are largely fixed during initial training. To overcome these challenges, we propose \textit{multimodal \textbf{S}elf-\textbf{R}eflection enhanced reasoning with Group Relative \textbf{P}olicy \textbf{O}ptimization} \textbf{SRPO}, a two-stage reflection-aware reinforcement learning (RL) framework explicitly designed to enhance multimodal LLM reasoning. In the first stage, we construct a high-quality, reflection-focused dataset under the guidance of an advanced MLLM, which generates reflections based on initial responses to help the policy model to learn both reasoning and self-reflection. In the second stage, we introduce a novel reward mechanism within the GRPO framework that encourages concise and cognitively meaningful reflection while avoiding redundancy. Extensive experiments across multiple multimodal reasoning benchmarks—including MathVista, MathVision, Mathverse, and MMMU-Pro—using Qwen-2. 5-VL-7B and Qwen-2. 5-VL-32B demonstrate that SRPO significantly outperforms state-of-the-art models, achieving notable improvements in both reasoning accuracy and reflection quality.

NeurIPS Conference 2024 Conference Paper

Boosting Text-to-Video Generative Model with MLLMs Feedback

  • Xun Wu
  • Shaohan Huang
  • Guolong Wang
  • Jing Xiong
  • Furu Wei

Recent advancements in text-to-video generative models, such as Sora, have showcased impressive capabilities. These models have attracted significant interest for their potential applications. However, they often rely on extensive datasets of variable quality, which can result in generated videos that lack aesthetic appeal and do not accurately reflect the input text prompts. A promising approach to mitigate these issues is to leverage Reinforcement Learning from Human Feedback (RLHF), which aims to align the outputs of text-to-video generative with human preferences. However, the considerable costs associated with manual annotation have led to a scarcity of comprehensive preference datasets. In response to this challenge, our study begins by investigating the efficacy of Multimodal Large Language Models (MLLMs) generated annotations in capturing video preferences, discovering a high degree of concordance with human judgments. Building upon this finding, we utilize MLLMs to perform fine-grained video preference annotations across two dimensions, resulting in the creation of VideoPrefer, which includes 135, 000 preference annotations. Utilizing this dataset, we introduce VideoRM, the first general-purpose reward model tailored for video preference in the text-to-video domain. Our comprehensive experiments confirm the effectiveness of both VideoPrefer and VideoRM, representing a significant step forward in the field.

NeurIPS Conference 2024 Conference Paper

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

  • Chuanyang Zheng
  • Yihang Gao
  • Han Shi
  • Minbin Huang
  • Jingyao Li
  • Jing Xiong
  • Xiaozhe Ren
  • Michael Ng

Positional encoding plays a crucial role in transformers, significantly impact- ing model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and flexibility. Hence, we expect that the desired positional encoding should be data-adaptive and can be dynamically adjusted with the given attention. In this paper, we propose a Data-Adaptive Positional Encoding (DAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization, where the improvements are statistically significant. The model visualization suggests that our model can keep both local and anti-local information. Finally, we successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods, revealing the benefit of the adaptive positional encoding method.

ICLR Conference 2024 Conference Paper

DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning

  • Jing Xiong
  • Zixuan Li
  • Chuanyang Zheng
  • Zhijiang Guo
  • Yichun Yin
  • Enze Xie
  • Zhicheng Yang
  • Qingxing Cao

Recent advances in natural language processing, primarily propelled by Large Language Models (LLMs), have showcased their remarkable capabilities grounded in in-context learning. A promising avenue for guiding LLMs in intricate reasoning tasks involves the utilization of intermediate reasoning steps within the Chain-of-Thought (CoT) paradigm. Nevertheless, the central challenge lies in the effective selection of exemplars for facilitating in-context learning. In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking (DQ-LoRe) to automatically select exemplars for in-context learning. Dual Queries first query LLM to obtain LLM-generated knowledge such as CoT, then query the retriever to obtain the final exemplars via both question and the knowledge. Moreover, for the second query, LoRe employs dimensionality reduction techniques to refine exemplar selection, ensuring close alignment with the input question's knowledge. Through extensive experiments, we demonstrate that DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5\% to 94.2\%. Our comprehensive analysis further reveals that DQ-LoRe consistently outperforms retrieval-based approaches in terms of both performance and adaptability, especially in scenarios characterized by distribution shifts. DQ-LoRe pushes the boundaries of in-context learning and opens up new avenues for addressing complex reasoning challenges.

ICLR Conference 2024 Conference Paper

LEGO-Prover: Neural Theorem Proving with Growing Libraries

  • Haiming Wang
  • Huajian Xin
  • Chuanyang Zheng
  • Zhengying Liu
  • Qingxing Cao
  • Yinya Huang
  • Jing Xiong
  • Han Shi

Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0\% to 57.0\%) and miniF2F-test (45.5\% to 50.0\%). During the proving process, LEGO-Prover also generates over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in a 4.9\% improvement in success rate

NeurIPS Conference 2024 Conference Paper

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better

  • Xun Wu
  • Shaohan Huang
  • Guolong Wang
  • Jing Xiong
  • Furu Wei

Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited applicability for instruction tuning in open-source text-to-image generative models and hinder further exploration. To address these challenges and promote the alignment of generative models through instruction tuning, we leverage multimodal large language models to create VisionPrefer, a high-quality and fine-grained preference dataset that captures multiple preference aspects. We aggregate feedback from AI annotators across four aspects: prompt-following, aesthetic, fidelity, and harmlessness to construct VisionPrefer. To validate the effectiveness of VisionPrefer, we train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators. Furthermore, we use two reinforcement learning methods to supervised fine-tune generative models to evaluate the performance of VisionPrefer, and extensive experimental results demonstrate that VisionPrefer significantly improves text-image alignment in compositional image generation across diverse aspects, e. g. , aesthetic, and generalizes better than previous human-preference metrics across various image distributions. Moreover, VisionPrefer indicates that the integration of AI-generated synthetic data as a supervisory signal is a promising avenue for achieving improved alignment with human preferences in vision generative models.

JBHI Journal 2022 Journal Article

LSTformer: Long Short-Term Transformer for Real Time Respiratory Prediction

  • Min Tan
  • Huixian Peng
  • Xiaokun Liang
  • Yaoqin Xie
  • Zeyang Xia
  • Jing Xiong

Since the tumor moves with the patient's breathing movement in clinical surgery, the real-time prediction of respiratory movement is required to improve the efficacy of radiotherapy. Some RNN-based respiratory management methods have been proposed for this purpose. However, these existing RNN-based methods often suffer from the degradation of generalization performance for a long-term window (such as 600 ms) because of the structural consistency constraints. In this paper, we propose an innovative Long Short-term Transformer (LSTformer) for long-term real-time accurate respiratory prediction. Specifically, a novel Long-term Information Enhancement module (LIE) is proposed to solve the performance degradation under a long window by increasing the long-term memory of latent variables. A lightweight Transformer Encoder (LTE) is proposed to satisfy the real-time requirement via simplifying the architecture and limiting the number of layers. In addition, we propose an application-oriented data augmentation strategy to generalize our LSTformer to practical application scenarios, especially robotic radiotherapy. Extensive experiments on our augmented dataset and publicly available dataset demonstrate the state-of-the-art performance of our method on the premise of satisfying the real-time demand.

JBHI Journal 2018 Journal Article

Tooth and Alveolar Bone Segmentation From Dental Computed Tomography Images

  • Yangzhou Gan
  • Zeyang Xia
  • Jing Xiong
  • Guanglin Li
  • Qunfei Zhao

Three-dimensional (3D) models of tooth-alveolar bone complex are needed in treatment planning and simulation for computer-aided orthodontics. Tooth and alveolar bone segmentation from computed tomography (CT) images is a fundamental step in reconstructing their models. Due to less application of alveolar bone in conventional orthodontic treatment which may cause undesired side effects, the previous studies mainly focused on tooth segmentation and reconstruction, and did not consider the alveolar bone. In this study, we proposed a method to implement both tooth and alveolar bone segmentation from dental CT images for reconstructing their 3D models. First, the proposed method extracted the connected region of tooth and alveolar bone from CT images using a global convex level set model. Then, individual tooth and alveolar bone are separated from the connected region based on Radon transform and a local level set model. The experimental results showed that the proposed method could successfully complete both the tooth and alveolar bone segmentation from CT images, and outperformed the state of the art tooth segmentation methods in terms of accuracy. This suggests that the proposed method can be used in reconstructing the 3D models of tooth-alveolar bone complex for precise treatment.

ICRA Conference 2016 Conference Paper

Development of a robotic system for orthodontic archwire bending

  • Zeyang Xia
  • Hao Deng 0005
  • Shaokui Weng
  • Yangzhou Gan
  • Jing Xiong
  • Hesheng Wang 0001

Customized archwires are demanded in the lingual orthodontic treatment for patients suffering from malocclusion. Traditionally, these archwires could only be bent by experienced orthodontists manually. This pattern requires a specialized skills training and occupies long charside time, but still cannot ensure the accuracy of appliances. Therefore, a novel robotic system was developed for automatic and accurate preparation in our study. First, the implementation of hardware system was designed. Second, a modular and ROS-integrated control system was developed to control automatic bending. Third, an adaptive sampling-based bending planner with collision checker in a time-varying environment was established and realized in control system architecture. Preliminary validation of the developed robot system and its control system have been conducted both in simulation and physical robotic system. Experimental results have shown that the developed robotic system with its ROS-integrated control system was able to accomplish automatic and accurate orthodontic archwire preparation.

IROS Conference 2015 Conference Paper

Motion planning and control of a robotic system for orthodontic archwire bending

  • Hao Deng 0005
  • Zeyang Xia
  • Shaokui Weng
  • Yangzhou Gan
  • Jing Xiong
  • Yongsheng Ou
  • Jianwei Zhang 0001

In clinics, customized archwires are demanded for lingual orthodontic treatment. However, only very experienced orthodontists can handle the manual appliance preparation. This pattern not only occupies lots of the orthodontist's labor time, but also can not ensure the accuracy of the appliances. Therefore, a robotic system was developed for automatic and accurate orthodontic archwire bending in our study. First, a method for customized archwire parameterization was developed. Second, an adaptive sampling-based bending planner with collision checker in time-varying environment was designed. Finally, a bending control strategy was used to eliminate the springback effect of the archwires and bending point shift during the bending process. A self-developed simulation platform based on Robot Operating System with MoveIt was used for preliminary validation of the proposed method. Physical experiments for multi-functional orthodontic bends on the robotic system were conducted as well. The results have shown that the developed robotic system using the proposed planning and control method was able to accomplish the automatic and accurate orthodontic archwire bending.