Arrow Research search

Author name cluster

Yang Jiao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
2 author rows

Possible papers

15

AAAI Conference 2026 Conference Paper

Identity-Aware Vision-Language Model for Explainable Face Forgery Detection

  • Junhao Xu
  • Jingjing Chen
  • Yang Jiao
  • Jiacheng Zhang
  • Zhiyu Tan
  • Hao Li
  • Yu-Gang Jiang

Recent advances in generative artificial intelligence have enabled the creation of highly realistic image forgeries, raising significant concerns about digital media authenticity. While existing detection methods demonstrate promising results on benchmark datasets, they face critical limitations in real-world applications. First, existing detectors typically fail to detect semantic inconsistencies with the person’s identity, such as implausible behaviors or incompatible environmental contexts in given images. Second, these methods rely heavily on low-level visual cues, making them effective for known forgeries but less reliable against new or unseen manipulation techniques. To address these challenges, we present a novel personalized vision-language model (VLM) that integrates low-level visual artifact analysis and high-level semantic inconsistency detection. Unlike previous VLM-based methods, our approach avoids resource-intensive supervised fine-tuning that often struggles to preserve distinct identity characteristics. Instead, we employ a lightweight method that dynamically encodes identity-specific information into specialized identifier tokens. This design enables the model to learn distinct identity characteristics while maintaining robust generalization capabilities. We further enhance detection capabilities through a lightweight detection adapter that extracts fine-grained information from shallow features of the vision encoder, preserving critical low-level evidence. Comprehensive experiments demonstrate that our approach achieves 94.25% accuracy and 94.08% F1 score, outperforming both traditional forgery detectors and general VLMs while requiring only 10 extra tokens.

NeurIPS Conference 2025 Conference Paper

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

  • Xiaoyang Liu
  • Kangjie Bao
  • Jiashuo Zhang
  • Yunqi Liu
  • Yu Chen
  • Yuntian Liu
  • Yang Jiao
  • Tao Luo

Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements. Distinct from prior approaches, ATLAS begins with a concept repository, accelerates the improvement of the student model through expert iteration combined with knowledge distillation, and introduces two novel augmentation strategies that exploit the structural characteristics of formal languages. Running the proposed ATLAS framework for 10 iterations, we construct an undergraduate-level dataset of 117k theorem statements and develop the ATLAS Translator by fine-tuning Llama3. 1-8B-Instruct with LoRA. This model establishes a new state of the art, demonstrating statistically significant improvements over both the Herald Translator and the Kimina-Autoformalizer across all benchmarks (p<0. 05, two-sided t-test). Furthermore, we demonstrate that the full-parameter fine-tuning of a stronger base model on the ATLAS dataset leads to superior performance. The datasets, model, and code are available at https: //github. com/XiaoyangLiu-sjtu/ATLAS.

UAI Conference 2025 Conference Paper

Divide and Orthogonalize: Efficient Continual Learning with Local Model Space Projection

  • Jin Shang
  • Simone Shao
  • Tian Tong
  • Fan Yang 0084
  • Yetian Chen
  • Yang Jiao
  • Jia Liu 0002
  • Yan Gao 0029

Continual learning (CL) has gained increasing interest in recent years due to the need for models that can continuously learn new tasks while retaining knowledge from previous ones. However, existing CL methods often require either computationally expensive layer-wise gradient projections or large-scale storage of past task data, making them impractical for resource-constrained scenarios. To address these challenges, we propose a local model space projection (LMSP)-based continual learning framework that significantly reduces computational complexity from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2)$ while preserving both forward and backward knowledge transfer with minimal performance trade-offs. We establish a theoretical analysis of the error and convergence properties of LMSP compared to conventional global approaches. Extensive experiments on multiple public datasets demonstrate that our method achieves competitive performance while offering substantial efficiency gains, making it a promising solution for scalable continual learning.

ICML Conference 2025 Conference Paper

DTZO: Distributed Trilevel Zeroth Order Learning with Provable Non-Asymptotic Convergence

  • Yang Jiao
  • Kai Yang 0001
  • Chengtao Jian

Trilevel learning (TLL) with zeroth order constraints is a fundamental problem in machine learning, arising in scenarios where gradient information is inaccessible due to data privacy or model opacity, such as in federated learning, healthcare, and financial systems. These problems are notoriously difficult to solve due to their inherent complexity and the lack of first order information. Moreover, in many practical scenarios, data may be distributed across various nodes, necessitating strategies to address trilevel learning problems without centralizing data on servers to uphold data privacy. To this end, an effective distributed trilevel zeroth order learning framework DTZO is proposed in this work to address the trilevel learning problems with level-wise zeroth order constraints in a distributed manner. The proposed DTZO is versatile and can be adapted to a wide range of (grey-box) trilevel learning problems with partial zeroth order constraints. In DTZO, the cascaded polynomial approximation can be constructed without relying on gradients or sub-gradients, leveraging a novel cut, i. e. , zeroth order cut. Furthermore, we theoretically carry out the non-asymptotic convergence rate analysis for the proposed DTZO in achieving the $\epsilon$-stationary point. Extensive experiments have been conducted to demonstrate and validate the superior performance of the proposed DTZO.

IROS Conference 2025 Conference Paper

Simultaneous 6-DOF localization and scanning angle detection of magnetic ultrasound capsule endoscope (MUSCE) with internal sensors

  • Zhengxin Yang
  • Lihao Liu
  • Yang Jiao
  • Yaoyao Cui

Localization of magnetically actuated capsule endoscope (MCE) is essential for accurate actuation. Despite extensive progress in pose estimation using internal magnetic field sensors and external magnetic sources, it remains challenging to achieve localization when a time-varying internal magnetic field (IMF) exists. This study presents a compound sensing method for the magnetic ultrasound capsule endoscope (MUSCE) based on an internal magnetic field sensor array and an external permanent magnet source, achieving simultaneous 6-degree-of-freedom (DOF) localization for magnetic navigation and real-time ultrasound (US) beam scanning angle detection for distortion-free US imaging reconstruction. Firstly, a MUSCE consisting of an internal magnet, a US transducer, and hall sensors is designed, enabling simultaneous spiral structure-based locomotion and high-quality endoluminal US imaging. Then, a compound sensing strategy is presented, realizing the separation of time-varying IMF and external magnetic field (EMF), allowing synchronous 6-DOF MUSCE localization and US beam scanning angle detection. Finally, the effectiveness of the presented method is validated by tests. The demonstrated static localization error is 4. 08 ± 1. 91 mm in position norm and 2. 46 ± 1. 31 ° in orientation, in a workspace shared with the robotic manipulator. Also, the scanning angle detection can rectify distortion in US image, showing potential clinical applications.

AAAI Conference 2024 Conference Paper

Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

  • Yang Jiao
  • Zequn Jie
  • Shaoxiang Chen
  • Lechao Cheng
  • Jingjing Chen
  • Lin Ma
  • Yu-Gang Jiang

Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field. Under such a paradigm, accurate BEV representation construction relies on reliable depth estimation for multi-camera images. However, existing approaches exhaustively predict depths for every pixel without prioritizing objects, which are precisely the entities requiring detection in the 3D space. To this end, we propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector. First, a category-specific structural priors mining approach is proposed for enhancing the efficacy of monocular depth generation. Besides, a self-boosting learning strategy is further proposed to encourage the model to place more emphasis on challenging objects in computation-expensive temporal stereo matching. Together they provide advanced depth estimation results for high-quality BEV features construction, benefiting the ultimate 3D detection. The proposed method achieves state-of-the-art performances on the challenging nuScenes benchmark, and extensive experimental results demonstrate the effectiveness of our designs.

NeurIPS Conference 2024 Conference Paper

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

  • Yang Jiao
  • Shaoxiang Chen
  • Zequn Jie
  • Jingjing Chen
  • Lin Ma
  • Yu-Gang Jiang

Large Multimodal Model (LMM) is a hot research topic in the computer vision area and has also demonstrated remarkable potential across multiple disciplinary fields. A recent trend is to further extend and enhance the perception capabilities of LMMs. The current methods follow the paradigm of adapting the visual task outputs to the format of the language model, which is the main component of a LMM. This adaptation leads to convenient development of such LMMs with minimal modifications, however, it overlooks the intrinsic characteristics of diverse visual tasks and hinders the learning of perception capabilities. To address this issue, we propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement. We decouple the LMM's learning of perception capabilities into task-agnostic and task-specific stages. Lumen first promotes fine-grained vision-language concept alignment, which is the fundamental capability for various visual tasks. Thus the output of the task-agnostic stage is a shared representation for all the tasks we address in this paper. Then the task-specific decoding is carried out by flexibly routing the shared representation to lightweight task decoders with negligible training efforts. Comprehensive experimental results on a series of vision-centric and VQA benchmarks indicate that our Lumen model not only achieves or surpasses the performance of existing LMM-based approaches in a range of vision-centric tasks while maintaining general visual understanding and instruction following capabilities.

AAAI Conference 2024 Conference Paper

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario

  • Tianwen Qian
  • Jingjing Chen
  • Linhai Zhuo
  • Yang Jiao
  • Yu-Gang Jiang

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues. Compared to traditional VQA tasks, VQA in autonomous driving scenario presents more challenges. Firstly, the raw visual data are multi-modal, including images and point clouds captured by camera and LiDAR, respectively. Secondly, the data are multi-frame due to the continuous, real-time acquisition. Thirdly, the outdoor scenes exhibit both moving foreground and static background. Existing VQA benchmarks fail to adequately address these complexities. To bridge this gap, we propose NuScenes-QA, the first benchmark for VQA in the autonomous driving scenario, encompassing 34K visual scenes and 460K question-answer pairs. Specifically, we leverage existing 3D detection annotations to generate scene graphs and design question templates manually. Subsequently, the question-answer pairs are generated programmatically based on these templates. Comprehensive statistics prove that our NuScenes-QA is a balanced large-scale benchmark with diverse question formats. Built upon it, we develop a series of baselines that employ advanced 3D detection and VQA techniques. Our extensive experiments highlight the challenges posed by this new task. Codes and dataset are available at https://github.com/qiantianwen/NuScenes-QA.

AAAI Conference 2024 Conference Paper

Provably Convergent Federated Trilevel Learning

  • Yang Jiao
  • Kai Yang
  • Tiancheng Wu
  • Chengtao Jian
  • Jianwei Huang

Trilevel learning, also called trilevel optimization (TLO), has been recognized as a powerful modelling tool for hierarchical decision process and widely applied in many machine learning applications, such as robust neural architecture search, hyperparameter optimization, and domain adaptation. Tackling TLO problems has presented a great challenge due to their nested decision-making structure. In addition, existing works on TLO face the following key challenges: 1) they all focus on the non-distributed setting, which may lead to privacy breach; 2) they do not offer any non-asymptotic convergence analysis which characterizes how fast an algorithm converges. To address the aforementioned challenges, this paper proposes an asynchronous federated trilevel optimization method to solve TLO problems. The proposed method utilizes u-cuts to construct a hyper-polyhedral approximation for the TLO problem and solve it in an asynchronous manner. We demonstrate that the proposed u-cuts are applicable to not only convex functions but also a wide range of non-convex functions that meet the u-weakly convex assumption. Furthermore, we theoretically analyze the non-asymptotic convergence rate for the proposed method by showing its iteration complexity to obtain ϵ-stationary point is upper bounded by O(1/ϵ²). Extensive experiments on real-world datasets have been conducted to elucidate the superiority of the proposed method, e.g., it has a faster convergence rate with a maximum acceleration of approximately 80%.

NeurIPS Conference 2024 Conference Paper

Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization

  • Chengtao Jian
  • Kai Yang
  • Yang Jiao

Out-of-Distribution (OOD) generalization in machine learning is a burgeoning area of study. Its primary goal is to enhance the adaptability and resilience of machine learning models when faced with new, unseen, and potentially adversarial data that significantly diverges from their original training datasets. In this paper, we investigate time series OOD generalization via pre-trained Large Language Models (LLMs). We first propose a novel \textbf{T}ri-level learning framework for \textbf{T}ime \textbf{S}eries \textbf{O}OD generalization, termed TTSO, which considers both sample-level and group-level uncertainties. This formula offers a fresh theoretic perspective for formulating and analyzing OOD generalization problem. In addition, we provide a theoretical analysis to justify this method is well motivated. We then develop a stratified localization algorithm tailored for this tri-level optimization problem, theoretically demonstrating the guaranteed convergence of the proposed algorithm. Our analysis also reveals that the iteration complexity to obtain an $\epsilon$-stationary point is bounded by O($\frac{1}{\epsilon^{2}}$). Extensive experiments on real-world datasets have been conducted to elucidate the effectiveness of the proposed method.

ICLR Conference 2023 Conference Paper

Asynchronous Distributed Bilevel Optimization

  • Yang Jiao
  • Kai Yang 0001
  • Tiancheng Wu
  • Dongjin Song
  • Chengtao Jian

Bilevel optimization plays an essential role in many machine learning tasks, ranging from hyperparameter optimization to meta-learning. Existing studies on bilevel optimization, however, focus on either centralized or synchronous distributed setting. The centralized bilevel optimization approaches require collecting massive amount of data to a single server, which inevitably incur significant communication expenses and may give rise to data privacy risks. Synchronous distributed bilevel optimization algorithms, on the other hand, often face the straggler problem and will immediately stop working if a few workers fail to respond. As a remedy, we propose Asynchronous Distributed Bilevel Optimization (ADBO) algorithm. The proposed ADBO can tackle bilevel optimization problems with both nonconvex upper-level and lower-level objective functions, and its convergence is theoretically guaranteed. Furthermore, it is revealed through theoretic analysis that the iteration complexity of ADBO to obtain the $\epsilon$-stationary point is upper bounded by $\mathcal{O}(\frac{1}{{{\epsilon ^2}}})$. Thorough empirical studies on public datasets have been conducted to elucidate the effectiveness and efficiency of the proposed ADBO.

YNICL Journal 2023 Journal Article

Early detection of acute ischemic stroke using Contrast-enhanced electrical impedance tomography perfusion

  • Weirui Zhang
  • Yang Jiao
  • Tao Zhang
  • Xuechao Liu
  • Jianan Ye
  • Yuyan Zhang
  • Bin Yang
  • Meng Dai

A cerebral contrast-enhanced electrical impedance tomography perfusion method is developed for acute ischemic stroke during intravenous thrombolytic therapy. Several clinical contrast agents with stable impedance characteristics and high-conductivity contrast were screened experimentally as electrical impedance contrast agent candidates. The electrical impedance tomography perfusion method was tested on rabbits with focal cerebral infarction, and its capability for early detection was verified based on perfusion images. The experimental results showed that ioversol 350 performed significantly better as an electrical impedance contrast agent than other contrast agents (p < 0.01). Additionally, perfusion images of focal cerebral infarction in rabbits confirmed that the electrical impedance tomography perfusion method could accurately detect the location and area of different cerebral infarction lesions (p < 0.001). Therefore, the cerebral contrast-enhanced electrical impedance tomography perfusion method proposed herein combines traditional, dynamic continuous imaging with rapid detection and could be applied as an early, rapid-detection, auxiliary, bedside imaging method for patients after a suspected ischemic stroke in both prehospital and in-hospital settings.

NeurIPS Conference 2022 Conference Paper

Distributed Distributionally Robust Optimization with Non-Convex Objectives

  • Yang Jiao
  • Kai Yang
  • Dongjin Song

Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been applied in diverse applications, e. g. , network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to difference scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the distributed distributionally robust optimization (DDRO) problem. Furthermore, a new uncertainty set, i. e. , constrained $D$-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, remain robust against data heterogeneity and malicious attacks, but also tradeoff robustness with performance.

TCS Journal 2019 Journal Article

Algorithms for automatic ranking of participants and tasks in an anonymized contest

  • Yang Jiao
  • R. Ravi
  • Wolfgang Gatterbauer

We introduce a new set of problems based on the Chain Editing problem. In our version of Chain Editing, we are given a set of participants and a set of tasks that every participant attempts. For each participant-task pair, we know whether the participant has succeeded at the task or not. We assume that participants vary in their ability to solve tasks, and that tasks vary in their difficulty to be solved. In an ideal world, stronger participants should succeed at a superset of tasks that weaker participants succeed at. Similarly, easier tasks should be completed successfully by a superset of participants who succeed at harder tasks. In reality, it can happen that a stronger participant fails at a task that a weaker participant succeeds at. Our goal is to find a perfect nesting of the participant-task relations by flipping a minimum number of participant-task relations, implying such a “nearest perfect ordering” to be the one that is closest to the truth of participant strengths and task difficulties. Many variants of the problem are known to be NP-hard. We propose six natural k-near versions of the Chain Editing problem and classify their complexity. The input to a k-near Chain Editing problem includes an initial ordering of the participants (or tasks) that the final solution is required to be “close” to, by moving each participant (or task) at most k positions from the initial ordering. We obtain surprising results on the complexity of the six k-near problems: Five of the problems are polynomial-time solvable using dynamic programming, but one of them is NP-hard.

TCS Journal 2014 Journal Article

Squares in partial words

  • F. Blanchet-Sadri
  • Yang Jiao
  • John M. Machacek
  • J.D. Quigley
  • Xufan Zhang

We investigate the number of positions that do not start a square, the number of square occurrences, and the number of distinct squares in partial words, i. e. , sequences that may have undefined positions called holes. We show that the limit of the ratio of the maximum number of positions not starting a square in a binary partial word with h holes over its length n is 15/31 and the limit of the ratio of the minimum number of square occurrences in a binary partial word with h holes over its length n is 103/187, provided the limit of h / n is 0. Both limits turn out to match with the known limits for binary full words (those without holes). We prove another surprising result that the maximal proportion of defined positions that are square-free to the number of defined positions in a binary partial word with h holes of length n is 1/2, provided the limit of h / n is in the interval [ 1 / 11, 1 ). We also give a 2 k h tight bound on the number of rightmost occurrences of squares per position in a k-ary partial word with h holes. In addition, we provide a more detailed analysis than earlier ones for the maximum number of distinct squares in a one-hole partial word of length n over an alphabet of size k, bound that is independent of k.