Arrow Research search

Author name cluster

Yu Yin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
1 author row

Possible papers

11

AAAI Conference 2026 Conference Paper

Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

  • Mengying Wang
  • Chenhui Ma
  • Ao Jiao
  • Tuo Liang
  • Pengjun Lu
  • Shrinidhi Hegde
  • Yu Yin
  • Evren Gurkan-Cavusoglu

Large Language Models (LLMs) have greatly advanced knowledge graph question answering (KGQA), yet existing systems are typically optimized for returning highly relevant but predictable answers. A missing yet desired capacity is to exploit LLMs to suggest surprise and novel ("serendipitious") answers. In this paper, we formally define the serendipity-aware KGQA task and propose the SerenQA framework to evaluate LLMs' ability to uncover unexpected insights in scientific KGQA tasks. SerenQA includes a rigorous serendipity metric based on relevance, novelty, and surprise, along with an expert-annotated benchmark derived from the Clinical Knowledge Graph for drug repurposing. Additionally, it features a structured evaluation pipeline encompassing three subtasks: knowledge retrieval, subgraph reasoning, and serendipity exploration. Our experiments reveal that while state-of-the-art LLMs perform well on retrieval, they still struggle to identify genuinely surprising and valuable discoveries, underscoring a significant room for future research.

AAAI Conference 2025 Conference Paper

Certified Causal Defense with Generalizable Robustness

  • Yiran Qiao
  • Yu Yin
  • Chen Chen
  • Jing Ma

While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, numerous efforts have emerged in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range. However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, thereby excluding the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains.

NeurIPS Conference 2025 Conference Paper

Fix False Transparency by Noise Guided Splatting

  • Aly El Hakie
  • Yiren Lu
  • Yu Yin
  • Michael Jenkins
  • Yehe Liu

Opaque objects reconstructed by 3D Gaussian Splatting (3DGS) often exhibit a falsely transparent surface, leading to inconsistent background and internal patterns under camera motion in interactive viewing. This issue stems from the ill-posed optimization in 3DGS. During training, background and foreground Gaussians are blended via $\alpha$-compositing and optimized solely against the input RGB images using a photometric loss. As this process lacks an explicit constraint on surface opacity, the optimization may incorrectly assign transparency to opaque regions, resulting in view-inconsistent and falsely transparent output. This issue is difficult to detect in standard evaluation settings (i. e. , rendering static images), but becomes particularly evident in object-centric reconstructions under interactive viewing. Although other causes of view-inconsistency, such as popping artifacts, have been explored previously, false transparency has not been explicitly identified. To the best of our knowledge, we are the first to quantify, characterize, and develop solutions for this "false transparency" artifact, an under-reported artifact in 3DGS. Our strategy, Noise Guided Splatting (NGS), encourages surface Gaussians to adopt higher opacity by injecting opaque noise Gaussians in the object volume during training, requiring only minimal modifications to the existing splatting process. To quantitatively evaluate false transparency in static renderings, we propose a novel transmittance-based metric that measures the severity of this artifact. In addition, we introduce a customized, high-quality object-centric scan dataset exhibiting pronounced transparency issues, and we augment popular existing datasets (e. g. , DTU) with complementary infill noise specifically designed to assess the robustness of 3D reconstruction methods to false transparency. Experiments across multiple datasets show that NGS substantially reduces false transparency while maintaining competitive performance on standard rendering metrics (e. g. , PSNR), demonstrating its overall effectiveness.

NeurIPS Conference 2025 Conference Paper

Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning

  • Zhe Hu
  • Jing Li
  • Zhongzhu Pu
  • Hou Pong (Ken) Chan
  • Yu Yin

Vision Language Models exhibit impressive performance for various tasks, yet they often lack the sophisticated situational reasoning required for complex decision-making. This paper shows that VLMs can achieve surprisingly strong decision-making performance when visual scenes are replaced by textual descriptions, suggesting foundational reasoning can be effectively learned from language. Motivated by this insight, we propose Praxis-VLM, a reasoning VLM for vision-grounded decision-making. Praxis-VLM employs the GRPO algorithm on textual scenarios to instill robust reasoning capabilities, where models learn to evaluate actions and their consequences. These reasoning skills, acquired purely from text, successfully transfer to multimodal inference with visual inputs, significantly reducing reliance on scarce paired image-text training data. Experiments across diverse decision-making benchmarks demonstrate that Praxis-VLM substantially outperforms standard supervised fine-tuning, exhibiting superior performance and generalizability. Further analysis confirms that our models engage in explicit and effective reasoning, underpinning their enhanced performance and adaptability.

NeurIPS Conference 2025 Conference Paper

Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting

  • Yiren Lu
  • Yunlai Zhou
  • Yiran Qiao
  • Chaoda Song
  • Tuo Liang
  • Jing Ma
  • Huan Wang
  • Yu Yin

Open-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality. However, most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval. Moreover, they are limited to static scenes and struggle with dynamic scenes due to the complexities of motion modeling. In this paper, we propose Segment then Splat, a 3D-aware open vocabulary segmentation approach for both static and dynamic scenes based on Gaussian Splatting. Segment then Splat reverses the long established approach of "segmentation after reconstruction'' by dividing Gaussians into distinct object sets before reconstruction. Once reconstruction is complete, the scene is naturally segmented into individual objects, achieving true 3D segmentation. This design eliminates both geometric and semantic ambiguities, as well as Gaussian–object misalignment issues in dynamic scenes. It also accelerates the optimization process, as it eliminates the need for learning a separate language field. After optimization, a CLIP embedding is assigned to each object to enable open-vocabulary querying. Extensive experiments on various datasets demonstrate the effectiveness of our proposed method in both static and dynamic scenarios.

NeurIPS Conference 2024 Conference Paper

Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions

  • Zhe Hu
  • Tuo Liang
  • Jing Li
  • Yiren Lu
  • Yunlai Zhou
  • Yiran Qiao
  • Jing Ma
  • Yu Yin

Recent advancements in large vision language models have demonstrated remarkable proficiency across a wide range of tasks. Yet, these models still struggle with understanding the nuances of human humor through juxtaposition, particularly when it involves nonlinear narratives that underpin many jokes and humor cues. This paper investigates this challenge by focusing on comics with contradictory narratives, where each comic consists of two panels that create a humorous contradiction. We introduce the YesBut benchmark, which comprises tasks of varying difficulty aimed at assessing AI's capabilities in recognizing and interpreting these comics, ranging from literal content comprehension to deep narrative reasoning. Through extensive experimentation and analysis of recent commercial or open-sourced large vision language models, we assess their capability to comprehend the complex interplay of the narrative humor inherent in these comics. Our results show that even the state-of-the-art models still struggle with this task. Our findings offer insights into the current limitations and potential improvements for AI in understanding human creative expressions.

TIST Journal 2024 Journal Article

Model-Agnostic Adaptive Testing for Intelligent Education Systems via Meta-learned Gradient Embeddings

  • Haoyang Bi
  • Qi Liu
  • Han Wu
  • Weidong He
  • Zhenya Huang
  • Yu Yin
  • Haiping Ma
  • Yu Su

The field of education has undergone a significant revolution with the advent of intelligent systems and technology, which aim to personalize the learning experience, catering to the unique needs and abilities of individual learners. In this pursuit, a fundamental challenge is designing proper test for assessing the students’ cognitive status on knowledge and skills accurately and efficiently. One promising approach, referred to as Computerized Adaptive Testing (CAT), is to administrate computer-automated tests that alternately select the next item for each examinee and estimate their cognitive states given their responses to the selected items. Nevertheless, existing CAT systems suffer from inflexibility in item selection and ineffectiveness in cognitive state estimation, respectively. In this article, we propose a Model-Agnostic adaptive testing framework via Meta-leaned Gradient Embeddings, MAMGE for short, improving both item selection and cognitive state estimation simultaneously. For item selection, we design a Gradient Embedding-based Item Selector (GEIS) which incorporates the concept of gradient embeddings to represent items and selects the best ones that are both informative and representative. For cognitive state estimation, we propose a Meta-learned Cognitive State Estimator (MCSE) to automatically control the estimation process by learning to learn a proper initialization and dynamically inferred updates. Both MCSE and GEIS are inherently model-agnostic, and the two modules have an ingenious connection via meta-learned gradient embeddings. Finally, extensive experiments evaluate the effectiveness and flexibility of MAMGE.

AAAI Conference 2020 Conference Paper

Joint Super-Resolution and Alignment of Tiny Faces

  • Yu Yin
  • Joseph Robinson
  • Yulun Zhang
  • Yun Fu

Super-resolution (SR) and landmark localization of tiny faces are highly correlated tasks. On the one hand, landmark localization could obtain higher accuracy with faces of highresolution (HR). On the other hand, face SR would benefit from prior knowledge of facial attributes such as landmarks. Thus, we propose a joint alignment and SR network to simultaneously detect facial landmarks and super-resolve tiny faces. More specifically, a shared deep encoder is applied to extract features for both tasks by leveraging complementary information. To exploit representative power of the hierarchical encoder, intermediate layers of a shared feature extraction module are fused to form efficient feature representations. The fused features are then fed to task-specific modules to detect landmarks and super-resolve face images in parallel. Extensive experiments demonstrate that the proposed model significantly outperforms the state-of-the-art in both landmark localization and SR of faces. We show a large improvement for landmark localization of tiny faces (i. e. , 16 × 16). Furthermore, the proposed framework yields comparable results for landmark localization on low-resolution (LR) faces (i. e. , 64 × 64) to existing methods on HR (i. e. , 256 × 256). As for SR, the proposed method recovers sharper edges and more details from LR face images than other state-of-the-art methods, which we demonstrate qualitatively and quantitatively.

AAAI Conference 2020 Conference Paper

Neural Cognitive Diagnosis for Intelligent Education Systems

  • Fei Wang
  • Qi Liu
  • Enhong Chen
  • Zhenya Huang
  • Yuying Chen
  • Yu Yin
  • Zai Huang
  • Shijin Wang

Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e. g. , logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Specifically, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i. e. , the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.

AAAI Conference 2018 Conference Paper

Exercise-Enhanced Sequential Modeling for Student Performance Prediction

  • Yu Su
  • Qingwen Liu
  • Qi Liu
  • Zhenya Huang
  • Yu Yin
  • Enhong Chen
  • Chris Ding
  • Si Wei

In online education systems, for offering proactive services to students (e. g. , personalized exercise recommendation), a crucial demand is to predict student performance (e. g. , scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text descriptions of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we first design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i. e. , knowledge states) in their sequential exercising process with the combination of exercise representations. For making final predictions, we design two strategies under EERNN, i. e. , EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.