Arrow Research search

Author name cluster

Yue Yao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

Bipartite Mode Matching for Vision Training Set Search from a Hierarchical Data Server

  • Yue Yao
  • Ruining Yang
  • Tom Gedeon

We explore a situation in which the target domain is accessible, but real-time data annotation is not feasible. Instead, we would like to construct an alternative training set from a large-scale data server so that a competitive model can be obtained. For this problem, because the target domain usually exhibits distinct modes (i.e., semantic clusters representing data distribution), if the training set does not contain these target modes, the model performance would be compromised. While prior existing works improve algorithms iteratively, our research explores the often-overlooked potential of optimizing the structure of the data server. Inspired by the hierarchical nature of web search engines, we introduce a hierarchical data server, together with a bipartite mode matching algorithm (BMM) to align source and target modes. For each target mode, we look in the server data tree for the best mode match, which might be large or small in size. Through bipartite matching, we aim for all target modes to be optimally matched with source modes in a one-on-one fashion. Compared with existing training set search algorithms, we show that the matched server modes constitute training sets that have consistently smaller domain gaps with the target domain across object re-identification (re-ID) and detection tasks. Consequently, models trained on our searched training sets have higher accuracy than those trained otherwise. BMM allows data-centric unsupervised domain adaptation (UDA) orthogonal to existing model-centric UDA methods. By combining the BMM with existing UDA methods like pseudo-labeling, further improvement is observed.

AAAI Conference 2026 Conference Paper

Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents

  • Zhixin Lin
  • Jungang Li
  • Shidong Pan
  • Yibo Shi
  • Yue Yao
  • Dongliang Xu

Smartphones bring significant convenience to users but also enable devices to extensively record various types of personal information. Existing smartphone agents powered by Multimodal Large Language Models (MLLMs) have achieved remarkable performance in automating different tasks. However, as the cost, these agents are granted substantial access to sensitive users' personal information during this operation. To gain a thorough understanding of the privacy awareness of these agents, we present the first large-scale benchmark encompassing 7,138 scenarios to the best of our knowledge. In addition, for privacy context in scenarios, we annotate its type (e.g., Account Credentials), sensitivity level, and location. We then carefully benchmark seven available mainstream smartphone agents. Our results demonstrate that almost all benchmarked agents show unsatisfying privacy awareness (RA), with performance remaining below 60% even with explicit hints. Overall, closed-source agents show better privacy ability than open-source ones, and Gemini 2.0-flash achieves the best, achieving an RA of 67%. We also find that the agents’ privacy detection capability is highly related to scenario sensitivity level, i.e., the scenario with a higher sensitivity level is typically more identifiable. We hope the findings enlighten the research community to rethink the unbalanced utility-privacy tradeoff about smartphone agents.

AAAI Conference 2025 Conference Paper

AD4CD: Causal-Guided Anomaly Detection for Enhancing Cognitive Diagnosis

  • Haiping Ma
  • Yue Yao
  • Changqian Wang
  • Siyu Song
  • Yong Yang

Cognitive diagnosis is a key task in computer-aided education, aimed at assessing a students' proficiency in specific knowledge concepts based on their responses to exercises. However, existing cognitive diagnosis models often overlook anomalies in students and exercises. For instance, some students might incorrectly response exercises despite having a strong grasp of the knowledge concept, or they might response correctly despite a lack of understanding. Such subtle anomalies can adversely affect the diagnostic results of the models. To address these anomalies, we conduct a qualitative analysis of how anomalous student states and exercise properties impact response outcomes using causal diagrams. We propose a framework named Anomaly Detection for Cognitive Diagnosis (AD4CD) to enhance the ability of Learning-to-Detect-Anomalous. AD4CD approaches the problem from a causal perspective, analyzing confounding paths that affect the true causal relationship between student ability and response outcomes, and designing an anomaly detection mechanism suitable for cognitive diagnostic models. Specifically, we first account for anomalous student behaviors and exercise properties and introduce response times from both students and exercises as modeling factors. By quantifying the response time distributions in high-dimensional features, we identify anomalies within skewed distributions, including both left-tail and right-tail anomalies. Using the detected anomaly scores, we comprehensively model the students' anomalous behaviors and exercise anomalies. Additionally, we reconstruct unbiased true abilities under natural conditions and use reconstruction loss as an anomaly score to assist in modeling guessing and slipping features. Lastly, AD4CD leverages a general cognitive diagnosis model as its backbone, optimizing the guessing and slipping features to provide unbiased and accurate feedback. Extensive experimental results demonstrate that AD4CD effectively captures anomalous data in the diagnostic process across three real-world datasets, enhancing the accuracy of the diagnostic results.

ICLR Conference 2024 Conference Paper

Alice Benchmarks: Connecting Real World Re-Identification with the Synthetic

  • Xiaoxiao Sun 0002
  • Yue Yao
  • Shengjin Wang
  • Hongdong Li
  • Liang Zheng 0001

For object re-identification (re-ID), learning from synthetic data has become a promising strategy to cheaply acquire large-scale annotated datasets and effective models, with few privacy concerns. Many interesting research problems arise from this strategy, e.g., how to reduce the domain gap between synthetic source and real-world target. To facilitate developing more new approaches in learning from synthetic data, we introduce the Alice benchmarks, large-scale datasets providing benchmarks as well as evaluation protocols to the research community. Within the Alice benchmarks, two object re-ID tasks are offered: person and vehicle re-ID. We collected and annotated two challenging real-world target datasets: AlicePerson and AliceVehicle, captured under various illuminations, image resolutions, etc. As an important feature of our real target, the clusterability of its training set is not manually guaranteed to make it closer to a real domain adaptation test scenario. Correspondingly, we reuse existing PersonX and VehicleX as synthetic source domains. The primary goal is to train models from synthetic data that can work effectively in the real world. In this paper, we detail the settings of Alice benchmarks, provide an analysis of existing commonly-used domain adaptation methods, and discuss some interesting future directions. An online server has been set up for the community to evaluate methods conveniently and fairly. Datasets and the online server details are available at https://sites.google.com/view/alice-benchmarks.

IROS Conference 2024 Conference Paper

Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations

  • Yue Yao
  • Shengchao Yan
  • Daniel Goehring
  • Wolfram Burgard
  • Joerg Reichardt

Robustness against Out-of-Distribution (OoD) samples is a key performance indicator of a trajectory prediction model. However, the development and ranking of state-of-the-art (SotA) models are driven by their In-Distribution (ID) performance on individual competition datasets. We present an OoD testing protocol that homogenizes datasets and prediction tasks across two large-scale motion datasets. We introduce a novel prediction algorithm based on polynomial representations for agent trajectory and road geometry on both the input and output sides of the model. With a much smaller model size, training effort, and inference time, we reach near SotA performance for ID testing and significantly improve robustness in OoD testing. Within our OoD testing protocol, we further study two augmentation strategies of SotA models and their effects on model generalization. Highlighting the contrast between ID and OoD performance, we suggest adding OoD testing to the evaluation criteria of trajectory prediction models.

ICRA Conference 2024 Conference Paper

Learning-Aided Warmstart of Model Predictive Control in Uncertain Fast-Changing Traffic

  • Mohamed-Khalil Bouzidi
  • Yue Yao
  • Daniel Goehring
  • Joerg Reichardt

Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fast-changing, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. Therefore, this paper proposes a framework for learning-aided warmstarts of Model Predictive Control algorithms. Our method leverages a neural network based multimodal predictor to generate multiple trajectory proposals for the autonomous vehicle, which are further refined by a sampling-based technique. This combined approach enables us to identify multiple distinct local minima and provide an improved initial guess. We validate our approach with Monte Carlo simulations of traffic scenarios.

AAAI Conference 2024 Conference Paper

Open-Set Facial Expression Recognition

  • Yuhang Zhang
  • Yue Yao
  • Xuannan Liu
  • Lixiong Qin
  • Wenjing Wang
  • Weihong Deng

Facial expression recognition (FER) models are typically trained on datasets with a fixed number of seven basic classes. However, recent research works (Cowen et al. 2021; Bryant et al. 2022; Kollias 2023) point out that there are far more expressions than the basic ones. Thus, when these models are deployed in the real world, they may encounter unknown classes, such as compound expressions that cannot be classified into existing basic classes. To address this issue, we propose the open-set FER task for the first time. Though there are many existing open-set recognition methods, we argue that they do not work well for open-set FER because FER data are all human faces with very small inter-class distances, which makes the open-set samples very similar to close-set samples. In this paper, we are the first to transform the disadvantage of small inter-class distance into an advantage by proposing a new way for open-set FER. Specifically, we find that small inter-class distance allows for sparsely distributed pseudo labels of open-set samples, which can be viewed as symmetric noisy labels. Based on this novel observation, we convert the open-set FER to a noisy label detection problem. We further propose a novel method that incorporates attention map consistency and cycle training to detect the open-set samples. Extensive experiments on various FER datasets demonstrate that our method clearly outperforms state-of-the-art open-set recognition methods by large margins. Code is available at https://github.com/zyh-uaiaaaa.