Arrow Research search

Author name cluster

Ruifan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

Diffusion-Assisted Progressive Learning for Weakly Supervised Phrase Localization

  • Pengyue Lin
  • Yanyang Hu
  • Xinjing Liu
  • Wenqi Jia
  • Fangxiang Feng
  • Ruifan Li

Weakly supervised phrase localization (WSPL) aims to localize visual objects mentioned by given phrases, but it learns without human-annotated bounding boxes. Previous works struggle in multi-object scenarios where objects in the background often appear simultaneously with the target objects. To this end, we propose a Diffusion-Assisted PrOgressive learning framework (i.e., DAPO) for WSPL task in this paper. Specifically, we score the difficulty of training samples based on the quantity of objects and the level of semantic alignment. These samples are then used progressively during training, in an order by their difficulty scores. To address the sample imbalance problem, we propose a Generation-Assisted Tuning (GAT) method for the grounding network. First, to enrich the samples from few-object scenarios, we leverage Stable Diffusion (SD) to generate images with phrases. Second, we introduce an attention-driven scheme to direct SD's attention on the mentioned objects. Finally, we design a diffusion-guided loss, which helps the grounding network learn the objects' layouts. Extensive experiments show that our DAPO framework outperforms the strong baselines on benchmark datasets.

AAAI Conference 2026 Conference Paper

OX-MABSR: A Benchmark for Open-domain Explainable Multimodal Aspect-Based Sentiment Reasoning

  • Xinjing Liu
  • ZiXin Xue
  • Pengyue Lin
  • Xinyu Tu
  • Siwei Xu
  • Ruifan Li

Multimodal Aspect-Based Sentiment Analysis (MABSA) involves extracting aspect terms from text-image pairs and identifying their sentiments. Most existing tasks consider one fixed sentiment category with explicitly mentioned aspects. However, these tasks seldom consider expressive sentiment categories, implicit aspects, and explainability. To this end, we introduce a novel task of Open-domain Explainable Multimodal Aspect-Based Sentiment Reasoning (OX-MABSR). This task enables the prediction of open-vocabulary aspect-sentiment pairs, together with the generation of sentiment explanations and reasoning paths. To benchmark OX-MABSR task, we construct OX-MABSR-Bench, a dataset annotated with explicit and implicit aspects, expressive sentiment categories, as well as perceptual and cognitive two-level explanations. The explanations capture visual and textual cues, including aesthetics, facial expressions, scenes, and textual semantics, together with background and situational knowledge. In addition, we annotate the reasoning paths that trace how the sentiment evolves from surface cues to a deeper contextual understanding. To address OX-MABSR task, we propose MABSR-LLM. Extensive experimental results show our MABSR-LLM outperforms strong baselines. To the best of our knowledge, we are the first to provide a unified framework for open-domain and explainable MABSR.

AAAI Conference 2025 Conference Paper

Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis

  • Zebin Yao
  • Fangxiang Feng
  • Ruifan Li
  • Xiaojie Wang

The customization of text-to-image models has seen significant advancements, yet generating multiple personalized concepts remains a challenging task. Current methods struggle with attribute leakage and layout confusion when handling multiple concepts, leading to reduced concept fidelity and semantic consistency. In this work, we introduce a novel training-free framework, Concept Conductor, designed to ensure visual fidelity and correct layout in multi-concept customization. Concept Conductor isolates the sampling processes of multiple customized models to prevent attribute leakage between different concepts and corrects erroneous layouts through self-attention-based spatial guidance. Additionally, we present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. This technique injects the structure and appearance of personalized concepts through feature fusion in the attention layers, ensuring harmony in the final image. Extensive qualitative and quantitative experiments demonstrate that Concept Conductor can consistently generate composite images with accurate layouts while preserving the visual details of each concept. Compared to existing baselines, Concept Conductor shows significant performance improvements. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts. The code and trained models will be made publicly available.

ECAI Conference 2023 Conference Paper

Enhanced Machine Reading Comprehension Method for Aspect Sentiment Quadruplet Extraction

  • Shuqin Ye
  • Zepeng Zhai
  • Ruifan Li

In the NLP domain, Aspect-Based Sentiment Analysis (ABSA) has gained significant attention in recent years due to its ability to perform fine-grained sentiment analysis. A challenging task in ABSA is Aspect Sentiment Quadruplet Extraction (ASQE), which involves the extraction of aspect terms and their associated opinion terms, sentiment polarities, and categories in the form of quadruplets. However, existing studies have ignored the strong dependence among the multiple subtasks involved in ASQE. In this paper, we propose a novel Enhanced Machine Reading Comprehension (EMRC) method and formalize ASQE task as a multi-turn MRC task. Our EMRC effectively learns and utilizes the relationships among different subtasks by incorporating previously generated query answers into the current queries. We design a hierarchical category classification strategy to perform the category prediction in a structured manner, enabling the model to tackle intricate categories with ease. Furthermore, we employ the bi-directional attention mechanism, i. e. , context-to-query and query-to-context attentions, to map the context into a task-aware representation. We conduct extensive experiments on two benchmark datasets. The results demonstrate that EMRC outperforms the state-of-art baselines. The source code is publicly available at https: //github. com/Little-Yeah/EMCR.

IJCAI Conference 2020 Conference Paper

Multi-scale Two-way Deep Neural Network for Stock Trend Prediction

  • Guang Liu
  • Yuzhao Mao
  • Qi Sun
  • Hailong Huang
  • Weiguo Gao
  • Xuan Li
  • Jianping Shen
  • Ruifan Li

Stock Trend Prediction(STP) has drawn wide attention from various fields, especially Artificial Intelligence. Most previous studies are single-scale oriented which results in information loss from a multi-scale perspective. In fact, multi-scale behavior is vital for making intelligent investment decisions. A mature investor will thoroughly investigate the state of a stock market at various time scales. To automatically learn the multi-scale information in stock data, we propose a Multi-scale Two-way Deep Neural Network. It learns multi-scale patterns from two types of scale-information, wavelet-based and downsampling-based, by eXtreme Gradient Boosting and Recurrent Convolutional Neural Network, respectively. After combining the learned patterns from the two-way, our model achieves state-of-the-art performance on FI-2010 and CSI-2016, where the latter is our published long-range stock dataset to help future studies for STP task. Extensive experimental results on the two datasets indicate that multi-scale information can significantly improve the STP performance and our model is superior in capturing such information.

AAAI Conference 2019 Conference Paper

Differential Networks for Visual Question Answering

  • Chenfei Wu
  • Jinlai Liu
  • Xiaojie Wang
  • Ruifan Li

The task of Visual Question Answering (VQA) has emerged in recent years for its potential applications. To address the VQA task, the model should fuse feature elements from both images and questions efficiently. Existing models fuse image feature element vi and question feature element qi directly, such as an element product viqi. Those solutions largely ignore the following two key points: 1) Whether vi and qi are in the same space. 2) How to reduce the observation noises in vi and qi. We argue that two differences between those two feature elements themselves, like (vi − vj) and (qi − qj), are more probably in the same space. And the difference operation would be beneficial to reduce observation noise. To achieve this, we first propose Differential Networks (DN), a novel plug-and-play module which enables differences between pair-wise feature elements. With the tool of DN, we then propose DN based Fusion (DF), a novel model for VQA task. We achieve state-of-the-art results on four publicly available datasets. Ablation studies also show the effectiveness of difference operations in DF model.

IJCAI Conference 2018 Conference Paper

Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning

  • Yuzhao Mao
  • Chang Zhou
  • Xiaojie Wang
  • Ruifan Li

Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. Single-sentence can hardly give a complete view of an image even by humans. In this paper, we propose a novel Topic-Oriented Multi-Sentence (\emph{TOMS}) captioning model, which can generate multiple topic-oriented sentences to describe an image. Different from object instances or attributes, topics mined by the latent Dirichlet allocation reflect hidden thematic structures in reference sentences of an image. In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective. With multiple sentences from different topics, our \emph{TOMS} provides a complete description of an image. Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our \emph{TOMS} in terms of topical consistency and descriptive completeness.