Arrow Research search

Author name cluster

Yi Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

TMLR Journal 2026 Journal Article

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

  • Jialin Yang
  • Dongfu Jiang
  • Tony He
  • Sherman Siu
  • Yuxuan Zhang
  • Disen Liao
  • Zhuofeng Li
  • Huaye Zeng

As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce $\textbf{StructEval}$, a comprehensive benchmark for evaluating LLMs' capabilities in producing both non-renderable (JSON, YAML, CSV) and renderable (HTML, React, SVG) structured formats. Unlike prior benchmarks, StructEval systematically evaluates structural fidelity across diverse formats through two paradigms: $\textbf{1)}$ generation tasks, producing structured output from natural language prompts, and $\textbf{2)}$ conversion tasks, translating between structured formats. Our benchmark encompasses 18 formats and 44 types of task, with novel metrics for format adherence and structural correctness. Results reveal significant performance gaps—even state-of-the-art models like o1-mini achieve only $75.58$ average score, with open-source alternatives lagging approximately $10$ points behind. We find generation tasks more challenging than conversion tasks, and producing correct visual content more difficult than generating text-only structures.

AAAI Conference 2026 Conference Paper

Unlearning in Cross-Modal Retrieval via Prior-Prototype Guided Partitioned Dampening

  • Yi Lu
  • Shu Li
  • Yurong Qian

Selective deletion of data from deep models, known as unlearning, has become crucial for enforcing the right to be forgotten, while also mitigating the negative impact of flawed training data. Retraining deep models is often impractical due to data access restrictions and computational overhead. Existing retraining-free methods are typically based on the Fisher Information Matrix (FIM), which quantifies the importance of model parameters with respect to forgetting classes, applying equal dampening to these parameters. This approach implicitly assumes a semantically uniform representation space, where all retained classes are equidistant from the forgetting classes. However, this assumption often fails in real-world cross-modal retrieval scenarios characterized by multi-label and non-orthogonal semantics. To overcome this limitation, we propose Prior-Prototype guided Partitioned dampening (PPP), an effective strategy for selective forgetting in cross-modal retrieval. First, PPP defines prior-prototypes, which are semantic centers derived from well-trained models, to identify neighbor classes semantically close to the forgetting set. Then, PPP uses Fisher information to identify parameters sensitive to forgetting and partitions them into buffer and core regions based on their relative importance to the neighbor and retained sets. Finally, PPP applies a hierarchical dampening strategy, where core parameters receive stronger suppression guided by prototype-based semantic disparities. Comprehensive evaluations on four large-scale benchmarks show that PPP performs competitively with retraining-based baselines, highlighting its effectiveness and generalizability in selective unlearning for cross-modal retrieval.

NeurIPS Conference 2025 Conference Paper

Understanding Parametric and Contextual Knowledge Reconciliation within Large Language Models

  • Jun Zhao
  • Yongzhuo Yang
  • Xiang Hu
  • Jingqi Tong
  • Yi Lu
  • Wei Wu
  • Tao Gui
  • Qi Zhang

Retrieval-Augmented Generation (RAG) provides additional contextual knowledge to complement the parametric knowledge in Large Language Models (LLMs). These two knowledge interweave to enhance the accuracy and timeliness of LLM responses. However, the internal mechanisms by which LLMs utilize these knowledge remain unclear. We propose modeling the forward propagation of knowledge as an entity flow, employing this framework to trace LLMs' internal behaviors when processing mixed-source knowledge. Linear probing utilizes a trainable linear classifier to detect specific attributes in hidden layers. However, once trained, a probe cannot adapt to dynamically specified entities. To address this challenge, we construct an entity-aware probe, which introduces special tokens to mark probing targets and employs a small trainable rank-8 lora update to process these special markers. We first verify this approach through an attribution experiment, demonstrating that it can accurately detect information about ad-hoc entities from complex hidden states. Next, we trace entity flows across layers to understand how LLMs reconcile conflicting knowledge internally. Our probing results reveal that contextual and parametric knowledge are routed between tokens through distinct sets of attention heads, supporting attention competition only within knowledge types. While conflicting knowledge maintains a residual presence across layers, aligned knowledge from multiple sources gradually accumulates, with the magnitude of this accumulation directly determining its influence on final outputs.

AAAI Conference 2025 Conference Paper

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

  • Yongliang Wu
  • Wenbo Zhu
  • Jiawang Cao
  • Yi Lu
  • Bozheng Li
  • Weiheng Chi
  • Zihan Qiu
  • Lirian Su

The demand for producing short-form videos for sharing on social media platforms has experienced significant growth in recent times. Despite notable advancements in the fields of video summarization and highlight detection, which can create partially usable short films from raw videos, these approaches are often domain-specific and require an in-depth understanding of real-world video content. To tackle this predicament, we propose Repurpose-10K, an extensive dataset comprising over 10,000 videos with more than 120,000 annotated clips aimed at resolving the video long-to-short task. Recognizing the inherent constraints posed by untrained human annotators, which can result in inaccurate annotations for repurposed videos, we propose a two-stage solution to obtain annotations from real-world user-generated content. Furthermore, we offer a baseline model to address this challenging task by integrating audio, visual, and caption aspects through a cross-modal fusion and alignment framework. We aspire for our work to ignite groundbreaking research in the lesser-explored realms of video repurposing.

NeurIPS Conference 2025 Conference Paper

VisualLens: Personalization through Task-Agnostic Visual History

  • Wang Bill Zhu
  • Deqing Fu
  • Kai Sun
  • Yi Lu
  • Zhaojiang Lin
  • Seungwhan Moon
  • Kanika Narang
  • Mustafa Canim

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible and generalizable for multimodal recommendation. We hypothesize that a user's visual history --- comprising images from daily life --- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10\% on Hit@3, and outperforms GPT-4o by 2-5\%. Further analysis shows that VisualLens is robust across varying history lengths and excels at adapting to both longer histories and unseen content categories.

NeurIPS Conference 2024 Conference Paper

APDDv2: Aesthetics of Paintings and Drawings Dataset with Artist Labeled Scores and Comments

  • Xin Jin
  • Qianqian Qiao
  • Yi Lu
  • Huaye Wang
  • Heng Huang
  • Shan Gao
  • Jianfei Liu
  • Rui Li

Datasets play a pivotal role in training visual models, facilitating the development of abstract understandings of visual features through diverse image samples and multidimensional attributes. However, in the realm of aesthetic evaluation of artistic images, datasets remain relatively scarce. Existing painting datasets are often characterized by limited scoring dimensions and insufficient annotations, thereby constraining the advancement and application of automatic aesthetic evaluation methods in the domain of painting. To bridge this gap, we introduce the Aesthetics Paintings and Drawings Dataset (APDD), the first comprehensive collection of paintings encompassing 24 distinct artistic categories and 10 aesthetic attributes. Building upon the initial release of APDDv1, our ongoing research has identified opportunities for enhancement in data scale and annotation precision. Consequently, APDDv2 boasts an expanded image corpus and improved annotation quality, featuring detailed language comments to better cater to the needs of both researchers and practitioners seeking high-quality painting datasets. Furthermore, we present an updated version of the Art Assessment Network for Specific Painting Styles, denoted as ArtCLIP. Experimental validation demonstrates the superior performance of this revised model in the realm of aesthetic evaluation, surpassing its predecessor in accuracy and efficacy. The dataset and model are available at https: //github. com/BestiVictory/APDDv2. git.

IJCAI Conference 2024 Conference Paper

Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories

  • Xin Jin
  • Qianqian Qiao
  • Yi Lu
  • Huaye Wang
  • Shan Gao
  • Heng Huang
  • Guangdong Li

Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aesthetics, the field of aesthetic evaluation for paintings and drawings has seen limited attention until the introduction of the BAID dataset in March 2023. This dataset solely comprises overall scores for high-quality artistic images. Our research marks the pioneering introduction of a multi-attribute, multi-category dataset specifically tailored to the field of painting: Aesthetics of Paintings and Drawings Dataset (APDD). The construction of APDD received active participation from 28 professional artists worldwide, along with dozens of students specializing in the field of art. This dataset encompasses 24 distinct artistic categories and 10 different aesthetic attributes. Each image in APDD has been evaluated by six professionally trained experts in the field of art, including assessments for both total aesthetic scores and aesthetic attribute scores. The final APDD dataset comprises a total of 4985 images, with an annotation count exceeding 31100 entries. Concurrently, we propose an innovative approach: Art Assessment Network for Specific Painting Styles (AANSPS), designed for the assessment of aesthetic attributes in mixed-attribute art datasets. Through this research, our goal is to catalyze advancements in the field of aesthetic evaluation for paintings and drawings, while enriching the available resources and methodologies for its further development and application. Dataset is available at https: //github. com/BestiVictory/APDD. git

JBHI Journal 2021 Journal Article

Multiple Embeddings Enhanced Multi-Graph Neural Networks for Chinese Healthcare Named Entity Recognition

  • Lung-Hao Lee
  • Yi Lu

Named Entity Recognition (NER) is a natural language processing task for recognizing named entities in a given sentence. Chinese NER is difficult due to the lack of delimited spaces and conventional features for determining named entity boundaries and categories. This study proposes the ME-MGNN (Multiple Embeddings enhanced Multi-Graph Neural Networks) model for Chinese NER in the healthcare domain. We integrate multiple embeddings at different granularities from the radical, character to word levels for an extended character representation, and this is fed into multiple gated graph sequence neural networks to identify named entities and classify their types. The experimental datasets were collected from health-related news, digital health magazines and medical question/answer forums. Manual annotation was conducted for a total of 68, 460 named entities across 10 entity types (body, symptom, instrument, examination, chemical, disease, drug, supplement, treatment and time) in 30, 692 sentences. Experimental results indicated our ME-MGNN model achieved an F1-score result of 75. 69, outperforming previous methods. In practice, a series of model analysis implied that our method is effective and efficient for Chinese healthcare NER.

IROS Conference 2017 Conference Paper

Preliminary study on magnetic tracking based navigation for wire-driven flexible robot

  • Changchun Zhang
  • Yi Lu
  • Xiaoxiao Qiu
  • Shuang Song 0002
  • Li Liu 0017
  • Max Q. -H. Meng

Flexible manipulator enables curvilinear accessibility through small incisions or natural orifices for minimally invasive surgery and diagnosis, which makes it a good choice for minimally invasive surgery. In order to control the robot precisely and safely, the real-time position and shape information of the robot need to be measured well. In this paper, we propose a magnetic tracking based tip pose and shape detection method for wire driven flexible robots. A permanent magnet is mounted at the distal end of the robot. Its magnetic field can be sensed with a sensor array. Therefore, position and orientation of the tip can be estimated utilizing the tracking method. A shape sensing algorithm is then carried out to estimate the real-time shape based on the tip pose. With the tip pose and shape display in the reconstructed visual environment, navigation can be achieved. This method provides the advantages that no sensors are needed to mount on the robot and has no line-of-sight problem. Experimental results verified the feasibility of the proposed method. A navigation error of 1. 9mm is achieved.

JBHI Journal 2014 Journal Article

The Sensitive and Efficient Detection of Quadriceps Muscle Thickness Changes in Cross-Sectional Plane Using Ultrasonography: A Feasibility Investigation

  • Jizhou Li
  • Yongjin Zhou
  • Yi Lu
  • Guangquan Zhou
  • Lei Wang
  • Yong-Ping Zheng

As a direct determinant parameter to quantify muscle activity, the muscle thickness (MT) has been investigated in many aspects and for various purposes. Ultrasonography (US) is a promising modality to detect muscle morphological changes during contractions since it is portable, noninvasive, and real time. However, there are few reports on sensitive and efficient estimation of changes of MT in a cross-sectional plane. In this feasibility investigation, we proposed a coarse-to-fine method based on a compressive-tracking algorithm for estimation of MT changes during an example task of isometric knee extension using ultrasound images. The sensitivity and efficiency are evaluated with 1920 US images from quadriceps muscle (QM) in eight subjects. The detection results were compared with those obtained from both traditional manual measurement and the well known normalized cross-correlation method, and the effect of the size of tracking window on detection performance was evaluated as well. It is demonstrated that the proposed method agrees well with the manual measurement. Meanwhile, it is not only sensitive to relatively small changes of MT but also computationally efficient.

IJCAI Conference 2013 Conference Paper

Fault-Tolerant Planning under Uncertainty

  • Luis Pineda
  • Yi Lu
  • Shlomo Zilberstein
  • Claudia V. Goldman

A fault represents some erroneous operation of a system that could result from an action selection error or some abnormal condition. We formally define error models that characterize the likelihood of various faults and consider the problem of faulttolerant planning, which optimizes performance given an error model. We show that factoring the possibility of errors significantly degrades the performance of stochastic planning algorithms such as LAO*, because the number of reachable states grows dramatically. We introduce an approach to plan for a bounded number of faults and analyze its theoretical properties. When combined with a continual planning paradigm, the k-fault-tolerant planning method can produce near-optimal performance, even when the number of faults exceeds the bound. Empirical results in two challenging domains confirm the effectiveness of the approach in handling different types of runtime errors.