Arrow Research search

Author name cluster

Yi Cai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

AAAI Conference 2026 Conference Paper

Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing

  • Li Yuan
  • Qingfei Huang
  • Bingshan Zhu
  • Yi Cai
  • Qingbao Huang
  • Changmeng Zheng
  • Zikun Deng
  • Tao Wang

Multimodal Knowledge Editing (MKE) extends traditional knowledge editing to settings involving both textual and visual modalities. However, existing MKE benchmarks primarily assess final answer correctness, neglecting the quality of intermediate reasoning and robustness to visually rephrased inputs. To address this limitation, we introduce MMQAKE, the first benchmark for multimodal multihop question answering with knowledge editing. MMQAKE evaluates: (1) a model’s ability to reason over 2–5-hop factual chains that span both text and images, including performance at each intermediate step; (2) robustness to visually rephrased inputs in multihop questions. Our evaluation shows that current MKE methods often struggle to consistently update and reason over multimodal reasoning chains following knowledge edits. To overcome these challenges, we propose Hybrid-DMKG, a hybrid reasoning framework built on a dynamic multimodal knowledge graph (DMKG) to enable accurate multihop reasoning over updated multimodal knowledge. Hybrid-DMKG first uses a large language model to decompose multimodal multihop questions into sequential sub-questions, then applies a multimodal retrieval model to locate updated facts by jointly encoding each sub-question with candidate entities and their associated images. For answer inference, a hybrid reasoning module operates over the DMKG via two parallel paths: (1) relation-linking prediction; (2) RAG Reasoning with large vision-language models. A background-reflective decision module then aggregates evidence from both paths to select the most credible answer. Experimental results on MMQAKE show that Hybrid-DMKG significantly outperforms existing MKE approaches, achieving higher accuracy and improved robustness to knowledge updates.

AAAI Conference 2026 Conference Paper

Rethinking Explanation Evaluation Under the Retraining Scheme

  • Yi Cai
  • Thibaud Ardoin
  • Mayank Gulati
  • Gerhard Wunder

Feature attribution has gained prominence as a tool for explaining model decisions, yet evaluating explanation quality remains challenging due to the absence of ground-truth explanations. To circumvent this, explanation-guided input manipulation has emerged as an indirect evaluation strategy, measuring explanation effectiveness through the impact of input modifications on model outcomes during inference. Despite the widespread use, a major concern with inference-based schemes is the distribution shift caused by such manipulations, which undermines the reliability of their assessments. The retraining-based scheme ROAR overcomes this issue by adapting the model to the altered data distribution. However, its evaluation results often contradict the theoretical foundations of widely accepted explainers. This work investigates this misalignment between empirical observations and theoretical expectations. In particular, we identify the Sign issue as a key factor responsible for residual information that ultimately distorts retraining-based evaluation. Based on the analysis, we show that a straightforward reframing of the evaluation process can effectively resolve the identified issue. Building on the existing framework, we further propose novel variants that jointly structure a comprehensive perspective on explanation evaluation. These variants largely improve evaluation efficiency over the standard retraining protocol, thereby enhancing practical applicability for explainer selection and benchmarking. Following our proposed schemes, empirical results across various data scales provide deeper insights into the performance of carefully selected explainers, revealing open challenges and future directions in explainability research.

AAAI Conference 2026 Conference Paper

SRACG: A Code Generation Framework with Selective Retrieval Augmentation

  • Mengzhen Wang
  • Shukai Ma
  • Songwen Gong
  • Jiexin Wang
  • Ruolin Chen
  • Liuwen Cao
  • Yi Cai

Large Language Models (LLMs) have demonstrated remarkable performance in code generation, offering new possibilities for translating natural language into executable programs. To further enhance LLMs’ code generation capabilities, Retrieval-Augmented Generation (RAG) has emerged as a promising strategy by retrieving code examples aligned with the generation intent to guide the process. However, existing RAG-based methods often suffer from unnecessary augmentation, preference misalignment, and surface-level mimicry, which undermine the effectiveness of retrieved examples in guiding LLMs toward accurate code generation. To address these challenges, we propose SRACG, a Selective Retrieval-Augmented Code Generation framework. SRACG begins with a necessity-aware selection mechanism to identify generation intents that genuinely require retrieval support, thereby avoiding degradation from indiscriminate augmentation. For intents identified as needing enhancement, it first employs a multi-objective retrieval strategy to select examples that are semantically aligned with the intent. These candidates are then further filtered by assessing their consistency with the LLM’s inherent generation preferences, ensuring alignment in both style and structure. Finally, it extracts execution plans from the filtered examples to uncover their underlying logic, guiding the LLM to better comprehend the examples instead of merely mimicking surface-level content. Experimental results on widely used benchmarks show that SRACG significantly improves the success rate of LLM-generated code and outperforms existing approaches.

IJCAI Conference 2025 Conference Paper

Collaborative Multi-LoRA Experts with Achievement-based Multi-Tasks Loss for Unified Multimodal Information Extraction

  • Li Yuan
  • Yi Cai
  • Xudong Shen
  • Qing Li
  • Qingbao Huang
  • Zikun Deng
  • Tao Wang

Multimodal Information Extraction (MIE) has gained attention for extracting structured information from multimedia sources. Traditional methods tackle MIE tasks separately, missing opportunities to share knowledge across tasks. Recent approaches unify these tasks into a generation problem using instruction-based T5 models with visual adaptors, optimized through full-parameter fine-tuning. However, this method is computationally intensive, and multi-task fine-tuning often faces gradient conflicts, limiting performance. To address these challenges, we propose collaborative multi-LoRA experts with achievement-based multi-task loss (C-LoRAE) for MIE tasks. C-LoRAE extends the low-rank adaptation (LoRA) method by incorporating a universal expert to learn shared multimodal knowledge from cross-MIE tasks and task-specific experts to learn specialized instructional task features. This configuration enhances the model’s generalization ability across multiple tasks while maintaining the independence of various instruction tasks and mitigating gradient conflicts. Additionally, we propose an achievement-based multi-task loss to balance training progress across tasks, addressing the imbalance caused by varying numbers of training samples in MIE tasks. Experimental results on seven benchmark datasets across three key MIE tasks demonstrate that C-LoRAE achieves superior overall performance compared to traditional fine-tuning methods and LoRA methods while utilizing a comparable number of training parameters to LoRA.

AAAI Conference 2025 Conference Paper

Content-free Logical Modification of Large Language Model by Disentangling and Modifying Logic Representation

  • Xin Wu
  • Yuqi Bu
  • Yifei Chen
  • Yi Cai

Despite extensive training on diverse datasets and alignment with human values, large language models (LLMs) can still generate fallacious outputs. Additionally, the validity of LLM's outputs varies significantly depending on the content. It is crucial to ensure LLMs' logical consistency across different contexts. Drawing inspiration from cognitive psychology studies, we propose a Logic Control Framework (LCF) that disentangles LLMs' hidden representations into separate content and logic spaces. Within the logic space, we use logically valid and invalid samples to construct distinct regions through contrastive learning. By moving logic representations to logically valid regions and fusing them with unchanged content representations, we significantly reduce logical fallacies in LLM outputs while maintaining content coherence. We demonstrate the effectiveness of LCF through experiments on conclusion generation and fallacy identification tasks, showing a significant improvement in logical validity and a reduction in fallacious outputs.

AAAI Conference 2025 Conference Paper

Explicitly Guided Difficulty-Controllable Visual Question Generation

  • Jiayuan Xie
  • Mengqiu Cheng
  • Xinting Zhang
  • Yi Cai
  • Guimin Hu
  • Mengying Xie
  • Qing Li

Visual question generation (VQG) aims to generate questions from images automatically. While existing studies primarily focus on the quality of generated questions, such as fluency and relevance, the difficulty of the questions is also a crucial factor in assessing their quality. Question difficulty directly impacts the effectiveness of VQG systems in applications like education and human-computer interaction, where appropriately challenging questions can stimulate learning interest and improve interaction experiences. However, accurately defining and controlling question difficulty is a challenging task due to its multidimensional and subjective nature. In this paper, we propose a new definition of the difficulty of questions, i.e., being positively correlated with the number of reasoning steps required to answer a question. For our definition, we construct a corresponding dataset and propose a benchmark as a foundation for future research. Our benchmark is designed to progressively increase the reasoning steps involved in generating questions. Specifically, we first extract the relationships among objects in the image to form a reasoning chain, then gradually increase the difficulty by rewriting the generated question to include more reasoning sub-chains. Experimental results on our constructed dataset show that our benchmark significantly outperforms existing baselines in controlling the reasoning chains of generated questions, producing questions with varying difficulty levels.

AAAI Conference 2025 Conference Paper

Look Around Before Locating: Considering Content and Structure Information for Visual Grounding

  • Shiyi Zheng
  • Peizhi Zhao
  • Zhilong Zheng
  • Peihang He
  • Haonan Cheng
  • Yi Cai
  • Qingbao Huang

As a long-term challenge and fundamental requirement in vision and language tasks, visual grounding aims to localize a target referred by a natural language query. The regional annotations form a superficial correlation between the subject of expression and some common visual entities, which hinder models from comprehending the linguistic content and structure. However, current one-stage methods struggle to uniformly model the visual and linguistic structure due to the structural gap between continuous image patches and discrete text tokens. In this paper, we propose a semi-structured reasoning framework for visual grounding to gradually comprehend the linguistic content and structure. Specifically, we devise a cross-modal content alignment module to effectively align unlabeled contextual information into a stable semantic space corrected by token-level prior knowledge obtained with CLIP. A multi-branch modulated localization module is also established to obtain modulation grounding by linguistic structure. Through a soft split mechanism, our method can destructure the expression into a fixed semi-structure (i.e., subject and context) while ensuring the completeness of linguistic content. Our method is thus capable of building a semi-structured reasoning system to effectively comprehend the linguistic content and structure by content alignment and structure modulated grounding. Experimental results on five widely-used datasets validate the performance improvements of our proposed method.

NeurIPS Conference 2025 Conference Paper

MTRec: Learning to Align with User Preferences via Mental Reward Models

  • Mengchen Zhao
  • Yifan Gao
  • Yaqing Hou
  • Xiangyang Li
  • Pengjie Gu
  • Zhenhua Dong
  • Ruiming Tang
  • Yi Cai

Recommendation models are predominantly trained using implicit user feedback, since explicit feedback is often costly to obtain. However, implicit feedback, such as clicks, does not always reflect users' real preferences. For example, a user might click on a news article because of its attractive headline, but end up feeling uncomfortable after reading the content. In the absence of explicit feedback, such erroneous implicit signals may severely mislead recommender systems. In this paper, we propose MTRec, a novel sequential recommendation framework designed to align with real user preferences by uncovering their internal satisfaction on recommended items. Specifically, we introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it. The learned mental reward model is then used to guide recommendation models to better align with users’ real preferences. Our experiments show that MTRec brings significant improvements to a variety of recommendation models. We also deploy MTRec on an industrial short video platform and observe a 7\% increase in average user viewing time.

ECAI Conference 2025 Conference Paper

On Perturbed Natural Adaptive Gradient Descent and Its Application in Portfolio Optimization

  • Yi Cai
  • Huili Liang
  • Yue Qiu
  • Xiao Wang
  • Tian Xie
  • Zixuan Zhao

In this paper, we introduce the Perturbed Natural Adaptive Gradient Descent (PN-AdaGrad) method, a novel optimization algorithm that combines the principles of Natural gradient descent and adaptive gradient descent on Riemannian manifold. We provide a rigorous theoretical analysis of the PN-AdaGrad method, proving its convergence to critical point of the objective function under mild assumptions. To validate the practical effectiveness of the PN-AdaGrad method, we verify our algorithm on real-world datasets in the context of portfolio optimization. Portfolio optimization involves selecting the optimal allocation of assets to maximize returns while minimizing risk. Our experiments show that the PN-AdaGrad method outperforms traditional gradient descent and other state-of-the-art optimization algorithms.

IROS Conference 2025 Conference Paper

RMG: Real-Time Expressive Motion Generation with Self-collision Avoidance for 6-DOF Companion Robotic Arms

  • Jiansheng Li
  • Haotian Song
  • Haoang Li
  • Jinni Zhou
  • Qiang Nie
  • Yi Cai

The six-degree-of-freedom (6-DOF) robotic arm has gained widespread application in human-coexisting environments. While previous research has predominantly focused on functional motion generation, the critical aspect of expressive motion in human-robot interaction remains largely unexplored. This paper presents a novel real-time motion generation planner that enhances interactivity by creating expressive robotic motions between arbitrary start and end states within predefined time constraints. Our approach involves three key contributions: first, we develop a mapping algorithm to construct an expressive motion dataset derived from human dance movements; second, we train motion generation models in both Cartesian and joint spaces using this dataset; third, we introduce an optimization algorithm that guarantees smooth, collision-free motion while maintaining the intended expressive style. Experimental results demonstrate the effectiveness of our method, which can generate expressive and generalized motions in under 0. 5 seconds while satisfying all specified constraints.

AAAI Conference 2024 Conference Paper

Automated Defect Report Generation for Enhanced Industrial Quality Control

  • Jiayuan Xie
  • Zhiping Zhou
  • Zihan Wu
  • Xinting Zhang
  • Jiexin Wang
  • Yi Cai
  • Qing Li

Defect detection is a pivotal aspect ensuring product quality and production efficiency in industrial manufacturing. Existing studies on defect detection predominantly focus on locating defects through bounding boxes and classifying defect types. However, their methods can only provide limited information and fail to meet the requirements for further processing after detecting defects. To this end, we propose a novel task called defect detection report generation, which aims to provide more comprehensive and informative insights into detected defects in the form of text reports. For this task, we propose some new datasets, which contain 16 different materials and each defect contains a detailed report of human constructs. In addition, we propose a knowledge-aware report generation model as a baseline for future research, which aims to incorporate additional knowledge to generate detailed analysis and subsequent processing related to defect in images. By constructing defect report datasets and proposing corresponding baselines, we chart new directions for future research and practical applications of this task.

IJCAI Conference 2024 Conference Paper

PoRank: A Practical Framework for Learning to Rank Policies

  • Pengjie Gu
  • Mengchen Zhao
  • Xu He
  • Yi Cai
  • Bo An

In many real-world scenarios, we need to select from a set of candidate policies before online deployment. Although existing Off-policy evaluation (OPE) methods can be used to estimate the online performance, they suffer from high variance. Fortunately, we care only about the ranking of the candidate policies, rather than their exact online rewards. Based on this, we propose a novel framework PoRank for learning to rank policies. In practice, learning to rank policies faces two main challenges: 1) generalization over the huge policy space and 2) lack of supervision signals. To overcome the first challenge, PoRank uses a Policy Comparison Transformer (PCT) for learning cross-policy representations, which capture the core discrepancies between policies and generalizes well across the whole policy space. The second challenge arises because learning to rank requires online comparisons of policies as ground-truth labels, whereas deploying policies online might be highly expensive. To overcome this, PoRank adopts a crowdsourcing based learning-to-rank (LTR) framework, where a set of OPE algorithms are employed to provide weak comparison labels. Experimental results show that PoRank not only outperforms baselines when the ground-truth labels are provided, but also achieves competitive performance when the ground-truth labels are unavailable.

AAAI Conference 2024 Conference Paper

Rethinking Two-Stage Referring Expression Comprehension: A Novel Grounding and Segmentation Method Modulated by Point

  • Peizhi Zhao
  • Shiyi Zheng
  • Wenye Zhao
  • Dongsheng Xu
  • Pijian Li
  • Yi Cai
  • Qingbao Huang

As a fundamental and challenging task in the vision and language domain, Referring Expression Comprehension (REC) has shown impressive improvements recently. However, for a complex task that couples the comprehension of abstract concepts and the localization of concrete instances, one-stage approaches are bottlenecked by computing and data resources. To obtain a low-cost solution, the prevailing two-stage approaches decouple REC into localization (region proposal) and comprehension (region-expression matching) at region-level, but the solution based on isolated regions cannot sufficiently utilize the context and is usually limited by the quality of proposals. Therefore, it is necessary to rebuild an efficient two-stage solution system. In this paper, we propose a point-based two-stage framework for REC, in which the two stages are redefined as point-based cross-modal comprehension and point-based instance localization. Specifically, we reconstruct the raw bounding box and segmentation mask into center and mass scores as soft ground-truth for measuring point-level cross-modal correlations. With the soft ground-truth, REC can be approximated as a binary classification problem, which fundamentally avoids the impact of isolated regions on the optimization process. Remarkably, the consistent metrics between center and mass scores allow our system to directly optimize grounding and segmentation by utilizing the same architecture. Experiments on multiple benchmarks show the feasibility and potential of our point-based paradigm. Our code available at https://github.com/VILAN-Lab/PBREC-MT.

AAAI Conference 2023 Short Paper

Category-Guided Visual Question Generation (Student Abstract)

  • Hongfei Liu
  • Jiali Chen
  • Wenhao Fang
  • Jiayuan Xie
  • Yi Cai

Visual question generation aims to generate high-quality questions related to images. Generating questions based only on images can better reduce labor costs and thus be easily applied. However, their methods tend to generate similar general questions that fail to ask questions about the specific content of each image scene. In this paper, we propose a category-guided visual question generation model that can generate questions with multiple categories that focus on different objects in an image. Specifically, our model first selects the appropriate question category based on the objects in the image and the relationships among objects. Then, we generate corresponding questions based on the selected question categories. Experiments conducted on the TDIUC dataset show that our proposed model outperforms existing models in terms of diversity and quality.

AAAI Conference 2023 Conference Paper

Ensemble-in-One: Ensemble Learning within Random Gated Networks for Enhanced Adversarial Robustness

  • Yi Cai
  • Xuefei Ning
  • Huazhong Yang
  • Yu Wang

Adversarial attacks have threatened modern deep learning systems by crafting adversarial examples with small perturbations to fool the convolutional neural networks (CNNs). To alleviate that, ensemble training methods are proposed to facilitate better adversarial robustness by diversifying the vulnerabilities among the sub-models, simultaneously maintaining comparable natural accuracy as standard training. Previous practices also demonstrate that enlarging the ensemble can improve the robustness. However, conventional ensemble methods are with poor scalability, owing to the rapidly increasing complexity when containing more sub-models in the ensemble. Moreover, it is usually infeasible to train or deploy an ensemble with substantial sub-models, owing to the tight hardware resource budget and latency requirement. In this work, we propose Ensemble-in-One (EIO), a simple but effective method to efficiently enlarge the ensemble with a random gated network (RGN). EIO augments a candidate model by replacing the parametrized layers with multi-path random gated blocks (RGBs) to construct an RGN. The scalability is significantly boosted because the number of paths exponentially increases with the RGN depth. Then by learning from the vulnerabilities of numerous other paths within the RGN, every path obtains better adversarial robustness. Our experiments demonstrate that EIO consistently outperforms previous ensemble training methods with smaller computational overheads, simultaneously achieving better accuracy-robustness trade-offs than adversarial training methods under black-box transfer attacks. Code is available at https://github.com/cai-y13/Ensemble-in-One.git

AAAI Conference 2023 Conference Paper

Joint Multimodal Entity-Relation Extraction Based on Edge-Enhanced Graph Alignment Network and Word-Pair Relation Tagging

  • Li Yuan
  • Yi Cai
  • Jin Wang
  • Qing Li

Multimodal named entity recognition (MNER) and multimodal relation extraction (MRE) are two fundamental subtasks in the multimodal knowledge graph construction task. However, the existing methods usually handle two tasks independently, which ignores the bidirectional interaction between them. This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction (JMERE) task. Besides, the current MNER and MRE models only consider aligning the visual objects with textual entities in visual and textual graphs but ignore the entity-entity relationships and object-object relationships. To address the above challenges, we propose an edge-enhanced graph alignment network and a word-pair relation tagging (EEGA) for the JMERE task. Specifically, we first design a word-pair relation tagging to exploit the bidirectional interaction between MNER and MRE and avoid error propagation. Then, we propose an edge-enhanced graph alignment network to enhance the JMERE task by aligning nodes and edges in the cross-graph. Compared with previous methods, the proposed method can leverage the edge information to auxiliary alignment between objects and entities and find the correlations between entity-entity relationships and object-object relationships. Experiments are conducted to show the effectiveness of our model.

AAAI Conference 2023 Conference Paper

Linking People across Text and Images Based on Social Relation Reasoning

  • Yang Lei
  • Peizhi Zhao
  • Pijian Li
  • Yi Cai
  • Qingbao Huang

As a sub-task of visual grounding, linking people across text and images aims to localize target people in images with corresponding sentences. Existing approaches tend to capture superficial features of people (e.g., dress and location) that suffer from the incompleteness information across text and images. We observe that humans are adept at exploring social relations to assist identifying people. Therefore, we propose a Social Relation Reasoning (SRR) model to address the aforementioned issues. Firstly, we design a Social Relation Extraction (SRE) module to extract social relations between people in the input sentence. Specially, the SRE module based on zero-shot learning is able to extract social relations even though they are not defined in the existing datasets. A Reasoning based Cross-modal Matching (RCM) module is further used to generate matching matrices by reasoning on the social relations and visual features. Experimental results show that the accuracy of our proposed SRR model outperforms the state-of-the-art models on the challenging datasets Who's Waldo and FL: MSRE, by more than 5\% and 7\%, respectively. Our source code is available at https://github.com/VILAN-Lab/SRR.

AAAI Conference 2023 Conference Paper

Memory-Oriented Structural Pruning for Efficient Image Restoration

  • Xiangsheng Shi
  • Xuefei Ning
  • Lidong Guo
  • Tianchen Zhao
  • Enshu Liu
  • Yi Cai
  • Yuhan Dong
  • Huazhong Yang

Deep learning (DL) based methods have significantly pushed forward the state-of-the-art for image restoration (IR) task. Nevertheless, DL-based IR models are highly computation- and memory-intensive. The surging demands for processing higher-resolution images and multi-task paralleling in practical mobile usage further add to their computation and memory burdens. In this paper, we reveal the overlooked memory redundancy of the IR models and propose a Memory-Oriented Structural Pruning (MOSP) method. To properly compress the long-range skip connections (a major source of the memory burden), we introduce a compactor module onto each skip connection to decouple the pruning of the skip connections and the main branch. MOSP progressively prunes the original model layers and the compactors to cut down the peak memory while maintaining high IR quality. Experiments on real image denoising, image super-resolution and low-light image enhancement show that MOSP can yield models with higher memory efficiency while better preserving performance compared with baseline pruning methods.

AAAI Conference 2022 Short Paper

Aspect-Opinion Sentiment Alignment for Cross-Domain Sentiment Analysis (Student Abstract)

  • Haopeng Ren
  • Yi Cai
  • Yushi Zeng

Cross-domain sentiment analysis (SA) has recently attracted significant attention, which can effectively alleviate the problem of lacking large-scale labeled data for deep neural network based methods. However, exiting unsupervised crossdomain SA models ignore the relation between the aspect and opinion, which suffer from the sentiment transfer error problem. To solve this problem, we propose an aspect-opinion sentiment alignment SA model and extensive experiments are conducted to evaluate the effectiveness of our model.

AAAI Conference 2022 Short Paper

Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract)

  • Yuqi Bu
  • Jiayuan Xie
  • Liuwu Li
  • Qiong Liu
  • Yi Cai

Referring expression comprehension aims at grounding the object in an image referred to by the expression. Scene text that serves as an identifier has a natural advantage in referring to objects. However, existing methods only consider the text in the expression, but ignore the text in the image, leading to a mismatch. In this paper, we propose a novel model that can recognize the scene text. We assign the extracted scene text to its corresponding visual region and ground the target object guided by expression. Experimental results on two benchmarks demonstrate the effectiveness of our model.

AAAI Conference 2022 Short Paper

Enhance Cross-Domain Aspect-Based Sentiment Analysis by Incorporating Commonsense Relational Structure (Student Abstract)

  • Yushi Zeng
  • Guohua Wang
  • Haopeng Ren
  • Yi Cai

Aspect Based Sentiment Analysis (ABSA) aims to extract aspect terms and identify the sentiment polarities towards each extracted aspect term. Currently, syntactic information is seen as the bridge for the domain adaptation and achieves remarkable performance. However, the transferable syntactic knowledge is complex and diverse, which causes the transfer error problem in domain adaptation. In our paper, we propose a domain-shared relational structure incorporated crossdomain ABSA model. The experimental results show the effectiveness of our model.

AAAI Conference 2022 Short Paper

Enhance Weakly-Supervised Aspect Detection with External Knowledge (Student Abstract)

  • Zhuoming Zheng
  • Yi Cai
  • Liuwu Li

Aspect detection aims to identify aspects of reviews and is an essential up-stream task of opinion mining and so on. However, existing weakly-supervised methods suffer from lacking the ability of identifying implicit aspects with infrequent aspect terms and “Misc” aspects. To tackle these problems, we propose to enhance the representation of segment with external knowledge by a weakly-supervised method. Experiments demonstrate the effectiveness of our model and the improvement by incorporating external knowledge.

AAAI Conference 2022 Short Paper

Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)

  • Ze Fu
  • Junhao Feng
  • Changmeng Zheng
  • Yi Cai

Existing scene graph generation methods suffer the limitations when the image lacks of sufficient visual contexts. To address this limitation, we propose a knowledge-enhanced scene graph generation model with multimodal relation alignment, which supplements the missing visual contexts by wellaligned textual knowledge. First, we represent the textual information into contextualized knowledge which is guided by the visual objects to enhance the contexts. Furthermore, we align the multimodal relation triplets by co-attention module for better semantics fusion. The experimental results show the effectiveness of our method.

JAAMAS Journal 2022 Journal Article

Learning by reusing previous advice: a memory-based teacher–student framework

  • Changxi Zhu
  • Yi Cai
  • Dickson K. W. Chiu

Abstract Reinforcement Learning (RL) has been widely used to solve sequential decision-making problems. However, it often suffers from slow learning speed in complex scenarios. Teacher–student frameworks address this issue by enabling agents to ask for and give advice so that a student agent can leverage the knowledge of a teacher agent to facilitate its learning. In this paper, we consider the effect of reusing previous advice, and propose a novel memory-based teacher–student framework such that student agents can memorize and reuse the previous advice from teacher agents. In particular, we propose two methods to decide whether previous advice should be reused: Q-Change per Step that reuses the advice if it leads to an increase in Q-values, and Decay Reusing Probability that reuses the advice with a decaying probability. The experiments on diverse RL tasks (Mario, Predator–Prey and Half Field Offense) confirm that our proposed framework significantly outperforms the existing frameworks in which previous advice is not reused.

AAAI Conference 2021 Short Paper

A Double Phases Generation Network for Yes or No Question Generation (Student Abstract)

  • Jiayuan Xie
  • Feng Chen
  • Yi Cai
  • Zehang Lin

This paper aims to solve the task of generating yes or no questions, which generates yes/no questions based on given passages. These questions can be used for evaluation automatically. We propose a double phases generation network that can identify specific phrases related to facts from the input passage and use them as auxiliary information for generation. Specifically, the 1st-phase prediction uses the extracted phrases as assistance to generate an initial question. Then, the 2nd-phase prediction utilizes an attention network to focus on the relevant phrases related to the initial question in the passage to generate questions that are more relevant to the specific facts contained in the initial question. Extensive experiments we performed on BoolQ dataset demonstrate the effectiveness of our framework.

AAAI Conference 2021 Short Paper

A Nested Named Entity Recognition Model Based on Multi-agent Communication Mechanism (Student Abstract)

  • Canguang Li
  • Guohua Wang
  • Jin Cao
  • Yi Cai

Traditional sequence tagging methods for named entity recognition (NER) face challenges when handling nested entities, where an entity is nested in another. Most previous methods for nested NER ignore the effect of entity boundary information or type information. Considering that entity boundary information and type information can be utilized to improve the performance of boundary detection, we propose a nested NER model with a multi-agent communication module. The type tagger and boundary tagger in the multi-agent communication module iteratively utilize the information from each other, which improves the boundary detection and the final performance of nested NER. Empirical experiments conducted on two nested NER datasets show the effectiveness of our model.

AAAI Conference 2021 Short Paper

An Entity-Aware Adversarial Domain Adaptation Network for Cross-Domain Named Entity Recognition (Student Abstract)

  • Qi Peng
  • Changmeng Zheng
  • Yi Cai
  • Tao Wang
  • Haoran Xie
  • Qing Li

Existing methods for named entity recognition are critically relied on labeled data. To handle the situation that the data is fully-unlabeled, we propose an entity-aware adversarial domain adaptation network, which utilizes the labeled source data and then adapts to unlabeled target domain. We first apply adversarial training to reduce the distribution gap between different domains. Furthermore, we introduce an entity-aware attention to guide adversarial process to achieve the alignment of entity features. The experiment shows that our model outperforms the state-of-the-art approaches.

AAAI Conference 2021 Conference Paper

Entity Guided Question Generation with Contextual Structure and Sequence Information Capturing

  • Qingbao Huang
  • Mingyi Fu
  • Linzhang Mo
  • Yi Cai
  • Jingyun Xu
  • Pijian Li
  • Qing Li
  • Ho-fung Leung

Question generation is a challenging task and has attracted widespread attention in recent years. Although previous studies have made great progress, there are still two main shortcomings: First, previous work did not simultaneously capture the sequence information and structure information hidden in the context, which results in poor results of the generated questions. Second, the generated questions cannot be answered by the given context. To tackle these issues, we propose an entity guided question generation model with contextual structure information and sequence information capturing. We use a Graph Convolutional Network and a Bidirectional Long Short Term Memory Network to capture the structure information and sequence information of the context, simultaneously. In addition, to improve the answerability of the generated questions, we use an entity-guided approach to obtain question type from the answer, and jointly encode the answer and question type. Both automatic and manual metrics show that our model can generate comparable questions with state-of-the-art models. Our code is available at https: //github. com/VISLANG-Lab/EGSS.

AAAI Conference 2021 Short Paper

Incorporating Bidirection-Interactive Information and Semantic Features for Relational Facts Extraction (Student Abstract)

  • Yang Yu
  • Guohua Wang
  • Haopeng Ren
  • Yi Cai

The interaction between named entity recognition and relation classification is quite essential for the extraction of relational triplets. However, most of jointly extraction works only consider unidirectional interaction between the two subtasks. They even neglect the interactive information totally. In order to tackle these problems, we propose a novel unified joint extraction model which considers bidirection-interactive information between the two subtasks. Our model consists of two modules. The first module utilizes Bi-LSTM and GCN to capture the sequential and the structure-semantic features of a sentence, The second module utilizes two layers to capture bidirection-interactive information between the two subtasks and generates relational triplets respectively. The experimental results show that our proposed model outperforms the stateof-the-art models on two public datasets.

AAAI Conference 2021 Short Paper

Incorporating Curiosity into Personalized Ranking for Collaborative Filtering (Student Abstract)

  • Qiqi Ding
  • Yi Cai
  • Ke Xu
  • Huakui Zhang

Curiosity affects the users’ selections of the items, and it motivates them to explore the items regardless of their preferences. This phenomenon is particularly common in the social networks. However, the existing social-based recommendation methods neglect users’ curiosity in the social networks, and it may cause the accuracy decease in recommendation. What’s more, only focusing on simulating the users’ preferences can lead to information cocoons. In order to tackle the problems, we propose a Curiosity Enhanced Bayesian Personalized Ranking (CBPR) model. Our model makes full use of the theories of psychology to model the users’ curiosity aroused when facing different opinions. The experimental results on two public datasets demonstrate the advantages of our CBPR model over the existing models.

AAAI Conference 2021 Short Paper

Information Block Detection in Infographic Based on Spatial Proximity and Structural Similarity (Student Abstract)

  • Jie Lin
  • Xin Wu
  • Jianwei Lu
  • Yi Cai

The infographic is a type of visualization chart used to display information. Existing infographic understanding works utilize spatial proximity to group elements into information blocks. However, these works ignore structural features such as background color and boundary, which results in poor performance towards complex infographics. We propose a Spatial and Structural Feature Extraction model to group elements based on spatial proximity and structural similarity. We introduce a new dataset for information block detection. Experiments show that our model can effectively identify the information blocks in the infographic.

AAAI Conference 2021 Conference Paper

Story Ending Generation with Multi-Level Graph Convolutional Networks over Dependency Trees

  • Qingbao Huang
  • Linzhang Mo
  • Pijian Li
  • Yi Cai
  • Qingguang Liu
  • Jielong Wei
  • Qing Li
  • Ho-fung Leung

As an interesting and challenging task, story ending generation aims at generating a reasonable and coherent ending for a given story context. The key challenge of the task is to comprehend the context sufficiently and capture the hidden logic information effectively, which has not been well explored by most existing generative models. To tackle this issue, we propose a context-aware Multi-level Graph Convolutional Networks over Dependency Parse (MGCN-DP) trees to capture dependency relations and context clues more effectively. We utilize dependency parse trees to facilitate capturing relations and events in the context implicitly, and Multilevel Graph Convolutional Networks to update and deliver the representation crossing levels to obtain richer contextual information. Both automatic and manual evaluations show that our MGCN-DP can achieve comparable performance with state-of-the-art models. Our source code is available at https: //github. com/VISLANG-Lab/MLGCN-DP.

TAAS Journal 2020 Journal Article

A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint

  • Changxi Zhu
  • Ho-fung Leung
  • Shuyue Hu
  • Yi Cai

In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.

AAMAS Conference 2019 Conference Paper

A Q-values Sharing Framework for Multiple Independent Q-learners

  • Changxi Zhu
  • Ho-fung Leung
  • Shuyue Hu
  • Yi Cai

By using a multiagent reinforcement learning (MARL) framework, cooperative agents can communicate with one another to accelerate the joint learning. In the teacher-student paradigm applied in MARL, a more experienced agent (advisor) can advise another agent (advisee) which action to take in a state. However, when agents need to cooperate with one another, the advisee may fail to cooperate well with others since their policies may have changed. It requires a long period for an advisee to learn the same best actions as an advisor has learned, especially when the amount of advice is limited. We propose a partaker-sharer advising framework (PSAF) for independent Q-learners with limited communication in cooperative MARL. In PSAF, the overall learning process is shown to accelerate by multiple independent Q-learners’ sharing their maximum Q-values with one another at every time step. We perform experiments in the Predator-Prey domain and HFO game. The results show that our approach significantly outperforms existing advising methods.

AAAI Conference 2019 Short Paper

Incorporating Context-Relevant Knowledge into Convolutional Neural Networks for Short Text Classification

  • Jingyun Xu
  • Yi Cai

Some text classification methods don’t work well on short texts due to the data sparsity. What’s more, they don’t fully exploit context-relevant knowledge. In order to tackle these problems, we propose a neural network to incorporate context-relevant knowledge into a convolutional neural network for short text classification. Our model consists of two modules. The first module utilizes two layers to extract concept and context features respectively and then employs an attention layer to extract those context-relevant concepts. The second module utilizes a convolutional neural network to extract high-level features from the word and the contextrelevant concept features. The experimental results on three datasets show that our proposed model outperforms the stateof-the-art models.