Arrow Research search

Author name cluster

Rui Mao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers
2 author rows

Possible papers

24

AAAI Conference 2026 Conference Paper

Are Language Models Any Good at Density Modeling?

  • Sriram Ranga
  • Sai Shashank Bedampeta
  • Rui Mao
  • Anupam Chattopadhyay

Large Language Models (LLMs) surprised the world with their ability to mimic humans in writing and are starting to be used as simulations of human writers for various kinds of linguistic analyses. However, these analyses rest on the belief that LLMs are good density models that accurately capture the underlying probability distribution of the language. In this paper, we question this basic assumption and try to evaluate language models on their density modelling capabilities. Since a ground truth does not exist for the probability distribution of any natural language, we come up with a synthetic language made up of decimal numbers written in words in English. We train language models from scratch on various probability distributions over this synthetic language and compare the distributions learned by the models with the original distributions. Experiments show that language models can learn underlying probability distributions across a wide range of cases, but they fail when those distributions depend on deep semantic properties of numbers that cannot be inferred from syntactic patterns. Additionally, we observed a strong bias in the models towards numbers that frequently occur as substrings within other numbers. This suggests that such a bias possibly exists in real-world natural language models as well, and negatively impacts downstream tasks and analyses that rely on model-generated probabilities.

AAAI Conference 2026 Conference Paper

CLER: Improving Multimodal Financial Reasoning by Cross-MLLM Error Reflection

  • Shuangyan Deng
  • Zhongsheng Wang
  • Rui Mao
  • Ciprian Doru Giurcăneanu
  • Jiamou Liu

Recent advances in Multimodal Large Language Models (MLLMs) have enabled joint reasoning over financial textual and visual inputs. However, they still struggle with financial terminology, logical consistency, and numerical computations. Moreover, while commercial large models perform well on reasoning tasks, their high inference costs limit their scalable usage in real world financial applications. We thus propose a cost-effective framework, CLER, that combines contrastive retrieval with step-wise reflection to improve reasoning performance. Also, the reasoning cost is only generated in the test stage when using commercial large models. CLER leverages FinErrorSet, a dataset of 8,000+ mistake correction pairs from diverse open-source MLLMs. A fine grained retriever is trained to identify structurally relevant errors for self-correction through individual reflection. Experiments on three benchmarks show that CLER consistently outperforms other baselines. To our knowledge, CLER is the first framework to use cross-model errors for financial reasoning.

AAAI Conference 2026 Conference Paper

LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models

  • Tiesunlong Shen
  • Rui Mao
  • Jin Wang
  • Heming Sun
  • Jian Zhang
  • Xuejie Zhang
  • Erik Cambria

Aligning Large Language Models (LLMs) with human preferences is critical, yet traditional fine-tuning methods are computationally expensive and inflexible. While test-time alignment offers a promising alternative, existing approaches often rely on distorted trajectory-level signals or inefficient sampling, fundamentally capping performance and failing to preserve the generative diversity of the base model. This paper introduces LLMdoctor, a novel framework for efficient test-time alignment that operates via a patient-doctor paradigm. It integrates token-level reward acquisition with token-level flow-guided preference optimization (TFPO) to steer a large, frozen patient LLM with a smaller, specialized doctor model. Unlike conventional methods that rely on trajectory-level rewards, LLMdoctor first extracts fine-grained, token-level preference signals from the patient model's behavioral variations. These signals then guide the training of the doctor model via TFPO, which establishes flow consistency across all subtrajectories, enabling precise token-by-token alignment while inherently preserving generation diversity. Extensive experiments demonstrate that LLMdoctor significantly outperforms existing test-time alignment methods and even surpasses the performance of full fine-tuning approaches like DPO.

AAAI Conference 2026 Conference Paper

MAPS: Multi-Agent Personality Shaping for Collaborative Reasoning

  • Jian Zhang
  • Zhiyuan Wang
  • Zhangqi Wang
  • Fangzhi Xu
  • Qika Lin
  • Lingling Zhang
  • Rui Mao
  • Erik Cambria

Collaborative reasoning with multiple agents offers the potential for more robust and diverse problem-solving. However, existing approaches often suffer from homogeneous agent behaviors and lack of reflective and rethinking capabilities. We propose Multi-Agent Personality Shaping ((MAPS), a novel framework that enhances reasoning through agent diversity and internal critique. Inspired by the Big Five personality theory, MAPS assigns distinct personality traits to individual agents, shaping their reasoning styles and promoting heterogeneous collaboration. To enable deeper and more adaptive reasoning, MAPS introduces a Critic agent that reflects on intermediate outputs, revisits flawed steps, and guides iterative refinement. This integration of personality-driven agent design and structured collaboration improves both reasoning depth and flexibility. Empirical evaluations across three benchmarks demonstrate the strong performance of MAPS, with further analysis confirming its generalizability across different large language models and validating the benefits of multi-agent collaboration.

AAAI Conference 2026 Conference Paper

RefleXNet: Targeted Self-Reflection for Accurate Chest X-ray Reporting

  • Xin Mei
  • Rui Mao
  • Xiaoyan Cai
  • Libin Yang
  • Erik Cambria

Automated interpretation and reporting of chest X-rays (CXRs) hold significant promise in reducing diagnostic errors and supporting radiologists under heavy clinical workloads. However, existing methods typically rely on global visual features and token-level supervision, limiting their sensitivity to subtle abnormalities and reducing their clinical reliability. To address these challenges, we present Reflective X-ray Network (RefleXNet), which systematically integrates multi-scale visual feature fusion and anatomical relational reasoning with a targeted self-reflective learning strategy. RefleXNet first constructs multi-scale visual representations and captures anatomical context through graph-based relational modeling. Building upon these representations, we introduce a targeted self-reflection strategy that uses clinically guided feedback from generated reports to selectively refine abnormality predictions and their associated region-level visual features. Extensive experiments on MIMIC-CXR demonstrate that RefleXNet consistently outperforms state-of-the-art baselines across clinical factual correctness metrics. Notably, our compact 3B-parameter model surpasses several recent models with over twice the parameter count. Additionally, RefleXNet exhibits strong generalization performance in zero-shot evaluations on IU-Xray compared with leading multimodal language models, highlighting its robustness and clinical effectiveness.

EAAI Journal 2026 Journal Article

Towards smart city supervision: A detection pipeline for illegal buildings

  • Wenjin Liu
  • Yuheng Li
  • Shudong Zhang
  • Rui Mao

Illegal buildings pose a significant threat to urban planning and public safety. However, current satellite remote sensing-based detection methods not only require significant human and financial resources but also suffer from prolonged detection cycles. To address these challenges, a novel pipeline utilizing high-tower cameras for monitoring building-related objects to promptly discover illegal construction is proposed. First, a camera pose error self-correction algorithm is introduced to address the installation errors in camera installations, ensuring camera stability during patrol operations. Secondly, a mapping model between physical and image space is developed to locate objects. Furthermore, a previous illegal building object dataset is improved, and a new illegal building object detection network (IBDNet) is proposed. The proposed IBDNet introduces a Visual State Space block, a frequency-aware feature fusion module, and a feature fusion enhancement module, which are used for spatiotemporal modeling and multi-level feature fusion to enhance detection accuracy and robustness. Experimental results demonstrate that the proposed detection algorithm accurately detects illegal building-related objects, achieving a mean average precision (mAP) of 86. 1% and outperforming the existing state-of-the-art models. By utilizing the proposed detection pipeline, illegal buildings can be precisely detected and located, allowing for timely identification and prevention of illegal construction activities. Furthermore, this method can potentially be integrated into broader applications, such as real-time urban planning systems and smart urban governance.

IS Journal 2025 Journal Article

A Retrieval-Augmented Multiagent System for Financial Sentiment Analysis

  • Kelvin Du
  • Yazhi Zhao
  • Rui Mao
  • Frank Xing
  • Erik Cambria

Financial sentiment analysis (FSA) has seen substantial advancements with the use of large language models (LLMs). Previous research highlighted the effectiveness of retrieval-augmented generation (RAG) and multiagent LLMs for FSA as these approaches alleviate the problems of hallucination, a lack of factual knowledge, and limited complex problem-solving capability. Despite this, the interplay and potential synergies between these two methods remain largely unexplored. This study presents a notable leap forward by introducing a retrieval-augmented multiagent system (RAMAS) to enhance LLM-based FSA performance. An RAMAS is specifically designed to deepen understanding of the critical factors that are inherent in FSA and mimic human-like consensus-making processes by adaptively learning from semantically similar few-shot samples and engaging in conversations among the generator, discriminator, and arbitrator agents. Our evaluation of RAMASs demonstrates improved accuracy and F1-score across multiple established FSA benchmark datasets.

JBHI Journal 2025 Journal Article

Advanced Heterogeneous Network-Based Graph Neural Network Framework for Predicting Anti-CRISPR Protein Sequences

  • Yeqiang Wang
  • Wenxiao Zhao
  • Yijun He
  • Jiale Li
  • Rui Mao

Anti-CRISPR proteins play a crucial role in bacterial-phage interactions by inhibiting the CRISPR/Cas system and thus enhancing phage survival. Accurately predicting these proteins is essential for understanding phage-host immune interactions and progressing CRISPR/ Cas-based technologies. Current approaches primarily analyze proteins individually, which may overlook the intrinsic similarities and potential connections among protein sequences. This study introduces PACRGNN, a graph neural network framework that creates a heterogeneous protein network by integrating sequence and structural similarities, wherein nodes represent proteins and edges signify their relationships. By combining Graph Attention (GAT) and Graph Sample and Aggregation (GraphSAGE) layers, PACRGNN captures both local and global topological dependencies, while incorporating six protein feature categories to enrich node representations. PACRGNN achieves an accuracy of 0. 9577, an F1-Score of 0. 9572, and a PRAUC of 0. 9876 on the validation set. The model demonstrated superior performance to existing methods on the independent test set derived from NCBI database (Jan. –Oct. 2024).

TIST Journal 2025 Journal Article

Aspect-Enhanced Explainable Recommendation with Multi-modal Contrastive Learning

  • Hao Liao
  • Shuo Wang
  • Hao Cheng
  • Wei Zhang
  • Jiwei Zhang
  • Mingyang Zhou
  • Kezhong Lu
  • Rui Mao

Explainable recommender systems ( ERS ) aim to enhance users’ trust in the systems by offering personalized recommendations with transparent explanations. This transparency provides users with a clear understanding of the rationale behind the recommendations, fostering a sense of confidence and reliability in the system’s outputs. Generally, the explanations are presented in a familiar and intuitive way, which is in the form of natural language, thus enhancing their accessibility to users. Recently, there has been an increasing focus on leveraging reviews as a valuable source of rich information in both modeling user-item preferences and generating textual interpretations, which can be performed simultaneously in a multi-task framework. Despite the progress made in these review-based recommendation systems, the integration of implicit feedback derived from user-item interactions and user-written text reviews has yet to be fully explored. To fill this gap, we propose a model named SERMON (A s pect-enhanced E xplainable R ecommendation with M ulti-modal C o ntrast Lear n ing). Our model explores the application of multimodal contrastive learning to facilitate reciprocal learning across two modalities, thereby enhancing the modeling of user preferences. Moreover, our model incorporates the aspect information extracted from the review, which provides two significant enhancements to our tasks. Firstly, the quality of the generated explanations is improved by incorporating the aspect characteristics into the explanations generated by a pre-trained model with controlled textual generation ability. Secondly, the commonly used user-item interactions are transformed into user-item-aspect interactions, which we refer to as interaction triple, resulting in a more nuanced representation of user preference. To validate the effectiveness of our model, we conduct extensive experiments on three real-world datasets. The experimental results show that our model outperforms state-of-the-art baselines, with a 2.0% improvement in prediction accuracy and a substantial 24.5% enhancement in explanation quality for the TripAdvisor dataset.

NeurIPS Conference 2025 Conference Paper

Deciphering the Extremes: A Novel Approach for Pathological Long-tailed Recognition in Scientific Discovery

  • Zhe Zhao
  • Haibin Wen
  • Xianfu Liu
  • Rui Mao
  • Pengkun Wang
  • Liheng Yu
  • Linjiang Chen
  • Bo An

Scientific discovery across diverse fields increasingly grapples with datasets exhibiting pathological long-tailed distributions: a few common phenomena overshadow a multitude of rare yet scientifically critical instances. Unlike standard benchmarks, these scientific datasets often feature extreme imbalance coupled with a modest number of classes and limited overall sample volume, rendering existing long-tailed recognition (LTR) techniques ineffective. Such methods, biased by majority classes or prone to overfitting on scarce tail data, frequently fail to identify the very instances—novel materials, rare disease biomarkers, faint astronomical signals—that drive scientific breakthroughs. This paper introduces a novel, end-to-end framework explicitly designed to address pathological long-tailed recognition in scientific contexts. Our approach synergizes a Balanced Supervised Contrastive Learning (B-SCL) mechanism, which enhances the representation of tail classes by dynamically re-weighting their contributions, with a Smooth Objective Regularization (SOR) strategy that manages the inherent tension between tail-class focus and overall classification performance. We introduce and analyze the real-world ZincFluor chemical dataset ($\mathcal{T}=137. 54$) and synthetic benchmarks with controllable extreme imbalances (CIFAR-LT variants). Extensive evaluations demonstrate our method's superior ability to decipher these extremes. Notably, on ZincFluor, our approach achieves a Tail Top-2 accuracy of $66. 84\%$, significantly outperforming existing techniques. On CIFAR-10-LT with an imbalance ratio of $1000$ ($\mathcal{T}=100$), our method achieves a tail-class accuracy of $38. 99\%$, substantially leading the next best. These results underscore our framework's potential to unlock novel insights from complex, imbalanced scientific datasets, thereby accelerating discovery.

IJCAI Conference 2025 Conference Paper

Deduction with Induction: Combining Knowledge Discovery and Reasoning for Interpretable Deep Reinforcement Learning

  • Haodi Zhang
  • Xiangyu Zeng
  • Junyang Chen
  • Yuanfeng Song
  • Rui Mao
  • Fangzhen Lin

Deep reinforcement learning (DRL) has achieved remarkable success in dynamic decision-making tasks. However, its inherent opacity and cold start problem hinder transparency and training efficiency. To address these challenges, we propose HRL-ID, a neural-symbolic framework that combines automated rule discovery with logical reasoning within a hierarchical DRL structure. HRL-ID dynamically extracts first-order logic rules from environmental interactions, iteratively refines them through success-based updates, and leverages these rules to guide action execution during training. Extensive experiments on Atari benchmarks demonstrate that HRL-ID outperforms state-of-the-art methods in training efficiency and interpretability, achieving higher reward rates and successful knowledge transfer between domains.

ICRA Conference 2025 Conference Paper

Effective Heterogeneous Point Cloud-Based Place Recognition and Relative Localization for Ground and Aerial Vehicles

  • Rui Mao
  • Hui Cheng

Place recognition and relative localization are crucial for realizing the potential of collaboration in ground and aerial robot teams. Many existing works focus only on ground robots and are not well-suited for heterogeneous robot systems in large-scale environments. In this paper, we propose a novel pipeline based on BEV density image, combined with an enhanced data structure, for place recognition in air-ground robotic collaboration systems. An efficient height alignment algorithm is proposed for relative localization. Extensive experiments on various types of public datasets validate the efficacy of our method compared to other SOTA works. We also show that our method is capable to detect inter- and intra-robot loop closures in a ground and aerial multi-session SLAM system.

IS Journal 2025 Journal Article

Explicable Artificial Intelligence for Affective Computing

  • Rui Mao
  • Erik Cambria
  • Yang Li
  • Newton Howard

Artificial intelligence (AI) is increasingly tasked with recognizing and responding to human emotions, making affective computing one of its most consequential frontiers. As AI spreads into finance, policymaking, and mental health, the opacity of deep learning models raises urgent challenges for trust, accountability, and ethics. This special issue addresses explicability not just as algorithmic transparency, but as a paradigm integrating cognitive science, the humanities, and ethical foresight with technical innovation. Guided by the “Seven Pillars for the Future of AI”— multidisciplinarity, task decomposition, parallel analogy, symbol grounding, similarity measure, intention awareness, and trustworthiness—it envisions affective AI as a partner in meaning-making rather than a mere inference engine. The six featured articles span topics from depression detection and sentiment analysis to hate speech moderation and interpretable driving behaviors, advancing affective AI that is accurate, interpretable, and aligned with human dignity.

NeurIPS Conference 2025 Conference Paper

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

  • Yunlong Tang
  • Pinxin Liu
  • Mingqian Feng
  • Zhangyun Tan
  • Rui Mao
  • Chao Huang
  • Jing Bi
  • Yunzhong Xiao

Understanding perspective is fundamental to human visual perception, yet the extent to which multimodal large language models (MLLMs) internalize perspective geometry remains unclear. We introduce MMPerspective, the first benchmark specifically designed to systematically evaluate MLLMs' understanding of perspective through 10 carefully crafted tasks across three complementary dimensions: Perspective Perception, Reasoning, and Robustness. Our benchmark comprises 2, 711 real-world and synthetic image instances with 5, 083 question-answer pairs that probe key capabilities, such as vanishing point perception and counting, perspective type reasoning, line relationship understanding in 3D space, invariance to perspective-preserving transformations, etc. Through a comprehensive evaluation of 43 state-of-the-art MLLMs, we uncover significant limitations: while models demonstrate competence on surface-level perceptual tasks, they struggle with compositional reasoning and maintaining spatial consistency under perturbations. Our analysis further reveals intriguing patterns between model architecture, scale, and perspective capabilities, highlighting both robustness bottlenecks and the benefits of chain-of-thought prompting. MMPerspective establishes a valuable testbed for diagnosing and advancing spatial understanding in vision-language systems. Resources are available at https: //yunlong10. github. io/MMPerspective/

IROS Conference 2025 Conference Paper

Seamless Transition Control in Spring-Legged Quadrotors: A Hybrid Dynamics Perspective with Guaranteed Feasibility

  • Hongli Li
  • Botao Zhang
  • Rui Mao
  • Tao Wang
  • Hui Cheng

Legged aerial-terrestrial robots have garnered significant research attention in recent years due to their enhanced environmental adaptability through combined aerial and terrestrial locomotion. However, existing passive spring-legged aerial robots exhibit limited motion versatility, demonstrating single stance gait during ground impacts, which constrains their task adaptability and creates substantial challenges in hybrid trajectory optimization and switching control. To address these difficulties, this work presents a systematic solution to achieve diverse hybrid locomotion. We innovatively establish the differential flatness property for spring-legged quadrotors in both aerial and terrestrial domains, and propose a unified hybrid trajectory optimization framework that generates smooth, agile, and dynamically feasible multi-modal trajectories incorporating diverse stance gait patterns. Furthermore, a hybrid nonlinear model predictive controller with a trajectory extension strategy is developed to enhance hybrid tracking precision and mode transition execution. Compared to existing methods, we achieve a 27% reduction in tracking error during hybrid locomotion while maintaining high-precision foot placement. The source code will be released to benefit the community 1

NeurIPS Conference 2025 Conference Paper

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

  • Qi Wang
  • Yanrui Yu
  • Ye Yuan
  • Rui Mao
  • Tianfei Zhou

Reinforcement fine-tuning (RFT) has shown great promise in achieving humanlevel reasoning capabilities of Large Language Models (LLMs), and has recently been extended to MLLMs. Nevertheless, reasoning about videos, which is a fundamental aspect of human intelligence, remains a persistent challenge due to the complex logic, temporal and causal structures inherent in video data. To fill this gap, we propose VideoRFT, a novel approach that extends the RFT paradigm to cultivate human-like video reasoning capabilities in MLLMs. VideoRFT follows the standard two-stage scheme in RFT: supervised fine-tuning (SFT) with chain-of-thought (CoT) annotations, followed by reinforcement learning (RL) to improve generalization. A central challenge to achieve this in the video domain lies in the scarcity of large-scale, high-quality video CoT datasets. We address this by building a multi-expert-driven, cognition-inspired CoT curation pipeline. First, we devise a cognition-inspired prompting strategy to elicit a reasoning LLM to generate preliminary CoTs based solely on rich, structured, and literal representations of video content. Subsequently, these CoTs are revised by a MLLM conditioned on the actual video, ensuring visual consistency and reducing visual hallucinations. This pipeline results in two new datasets, i. e. VideoRFT-CoT-102K for SFT and VideoRFT-RL-310K for RL. To further strengthen the RL phase, we introduce a novel semantic-consistency reward that explicitly promotes the alignment between textual reasoning and visual evidence. This reward encourages the model to produce coherent, context-aware reasoning outputs grounded in visual input. Extensive experiments show that VideoRFT achieves state-of-the-art performance on six video reasoning benchmarks.

IJCAI Conference 2024 Conference Paper

Modeling Personalized Retweeting Behaviors for Multi-Stage Cascade Popularity Prediction

  • Mingyang Zhou
  • Yanjie Lin
  • Gang Liu
  • Zuwen Li
  • Hao Liao
  • Rui Mao

Predicting the size of message cascades is critical in various applications, such as online advertising and early detection of rumors. However, most existing deep learning approaches rely on cascade observation, which hinders accurate cascade prediction before message posting. Besides, these approaches overlook personalized retweeting behaviors that reflect users' inclination to retweeting specific types of information. In this study, we propose a universal cascade prediction framework, namely Cascade prediction regarding Multiple Stage (CasMS), that effectively predicts cascade popularity across message generation stage as well as short-term and long-term stages. Unlike previous methods, our approach not only captures users' personalized retweeting behaviors but also incorporates temporal cascade features. We perform the experiments in datasets collected ourselves as well as public datasets. The results show that our method significantly surpasses existing approaches in predicting the cascade during the message generation stage and different time periods in the cascade dynamics.

NeurIPS Conference 2024 Conference Paper

Motif-oriented influence maximization for viral marketing in large-scale social networks

  • Mingyang Zhou
  • Weiji Cao
  • Hao Liao
  • Rui Mao

The influence maximization (IM) problem aims to identify a budgeted set of nodes with the highest potential to influence the largest number of users in a cascade model, a key challenge in viral marketing. Traditional \emph{IM} approaches consider each user/node independently as a potential target customer. However, in many scenarios, the target customers comprise motifs, where activating only one or a few users within a motif is insufficient for effective viral marketing, which, nevertheless, receives little attention. For instance, if a motif of three friends planning to dine together, targeting all three simultaneously is crucial for a restaurant advertisement to succeed. In this paper, we address the motif-oriented influence maximization problem under the linear threshold model. We prove that the motif-oriented IM problem is NP-hard and that the influence function is neither supermodular nor submodular, in contrast to the classical \emph{IM} setting. To simplify the problem, we establish the submodular upper and lower bounds for the influence function. By leveraging the submodular property, we propose a natural greedy strategy that simultaneously maximizes both bounds. Our algorithm has an approximation ratio of $\tau\cdot (1-1/e-\varepsilon)$ and a near-linear time complexity of $O((k+l)(m+\eta)\log \eta/\varepsilon^2)$. Experimental results on diverse datasets confirm the effectiveness of our approach in motif maximization.

IJCAI Conference 2024 Conference Paper

Understanding Public Perception Towards Weather Disasters Through the Lens of Metaphor

  • Rui Mao
  • Qika Lin
  • Qiawen Liu
  • Gianmarco Mengaldo
  • Erik Cambria

Extreme weather can lead to weather-induced disasters. These have a profound impact on communities worldwide, causing loss of life, damage to properties and infrastructure, and disruption of daily activities. In alignment with the United Nations Sustainable Development Goals, addressing the increasing frequency and severity of these events, exacerbated by climate change, is imperative. Exploring public perception and responses to weather disasters becomes crucial for policymakers to formulate effective strategies that not only mitigate the impacts but also contribute to the goal of ensuring sustainable and resilient communities. Social media, as a pervasive and real-time communication platform, has gathered a large amount of public opinion. In this work, we analyze public perception towards weather disasters based on tweets and metaphors. Metaphor, as a linguistic device, plays a pivotal role in unraveling cognitive processes and understanding how individuals perceive and make sense of concepts. We focus on tweets related to four distinct types of weather disasters i. e. , floods, hurricanes, tornadoes, and wildfires, aiming to extract nuanced insights regarding public perceptions, concerns, and attitudes towards these specific events. We also deliver constructive recommendations, based on the insights.

IS Journal 2023 Journal Article

Seven Pillars for the Future of Artificial Intelligence

  • Erik Cambria
  • Rui Mao
  • Melvin Chen
  • Zhaoxia Wang
  • Seng-Beng Ho

In recent years, artificial intelligence (AI) research has showcased tremendous potential to positively impact humanity and society. Although AI frequently outperforms humans in tasks related to classification and pattern recognition, it continues to face challenges when dealing with complex tasks such as intuitive decision making, sense disambiguation, sarcasm detection, and narrative understanding as these require advanced kinds of reasoning, e. g. , common-sense reasoning and causal reasoning, which have not been emulated satisfactorily yet. To address these shortcomings, we propose seven pillars that we believe represent the key hallmark features for the future of AI, namely, multidisciplinarity, task decomposition, parallel analogy, symbol grounding, similarity measure, intention awareness, and trustworthiness.

AAAI Conference 2023 Conference Paper

SKIER: A Symbolic Knowledge Integrated Model for Conversational Emotion Recognition

  • Wei Li
  • Luyao Zhu
  • Rui Mao
  • Erik Cambria

Emotion recognition in conversation (ERC) has received increasing attention from the research community. However, the ERC task is challenging, largely due to the complex and unstructured properties of multi-party conversations. Besides, the majority of daily dialogues take place in a specific context or circumstance, which requires rich external knowledge to understand the background of a certain dialogue. In this paper, we address these challenges by explicitly modeling the discourse relations between utterances and incorporating symbolic knowledge into multi-party conversations. We first introduce a dialogue parsing algorithm into ERC and further improve the algorithm through a transfer learning method. Moreover, we leverage different symbolic knowledge graph relations to learn knowledge-enhanced features for the ERC task. Extensive experiments on three benchmarks demonstrate that both dialogue structure graphs and symbolic knowledge are beneficial to the model performance on the task. Additionally, experimental results indicate that the proposed model surpasses baseline models on several indices.

AAAI Conference 2022 Conference Paper

Explainable Metaphor Identification Inspired by Conceptual Metaphor Theory

  • Mengshi Ge
  • Rui Mao
  • Erik Cambria

Metaphor is not only a linguistic phenomenon but also reflects the concept projection between source and target domains in human cognition. Previous sequence tagging-based metaphor identification methods could not model the concept projection, resulting in a limitation that the outputs of these models are unexplainable in the predictions of the metaphoricity labels. In this work, we propose the first explainable metaphor identification model, inspired by Conceptual Metaphor Theory. The model is based on statistic learning, a lexical resource, and a novel reward mechanism. Our model can identify the metaphoricity on the word-pair level, and explain the predicted metaphoricity labels via learned concept mappings. The use of the reward mechanism allows the model to learn the optimal concept mappings without knowing their true labels. Our method is also applicable for the concepts that are out of training domains by using the lexical resource. The automatically generated concept mappings demonstrate the implicit human thoughts in metaphoric expressions. Our experiments show the effectiveness of the proposed model in metaphor identification, and concept mapping tasks, respectively.

TIST Journal 2021 Journal Article

A Dynamic Convolutional Neural Network Based Shared-Bike Demand Forecasting Model

  • Shaojie Qiao
  • Nan Han
  • Jianbin Huang
  • Kun Yue
  • Rui Mao
  • Hongping Shu
  • Qiang He
  • Xindong Wu

Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF, is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.

AAAI Conference 2021 Conference Paper

Bridging Towers of Multi-task Learning with a Gating Mechanism for Aspect-based Sentiment Analysis and Sequential Metaphor Identification

  • Rui Mao
  • Xiao Li

Multi-task learning (MTL) has been widely applied in Natural Language Processing. A major task and its associated auxiliary tasks share the same encoder; hence, an MTL encoder can learn the sharing abstract information between the major and auxiliary tasks. Task-specific towers are then employed upon the sharing encoder to learn task-specific information. Previous works demonstrated that exchanging information between task-specific towers yielded extra gains. This is known as soft-parameter sharing MTL. In this paper, we propose a novel gating mechanism for the bridging of MTL towers. Our method is evaluated based on aspect-based sentiment analysis and sequential metaphor identification tasks. The experiments demonstrate that our method can yield better performance than the baselines on both tasks. Based on the same Transformer backbone, we compare our gating mechanism with other information transformation mechanisms, e. g. , cross-stitch, attention and vanilla gating. The experiments show that our method also surpasses these baselines.