Author name cluster

Rui Mao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2026 Conference Paper

Are Language Models Any Good at Density Modeling?

Sriram Ranga
Sai Shashank Bedampeta
Rui Mao
Anupam Chattopadhyay

Large Language Models (LLMs) surprised the world with their ability to mimic humans in writing and are starting to be used as simulations of human writers for various kinds of linguistic analyses. However, these analyses rest on the belief that LLMs are good density models that accurately capture the underlying probability distribution of the language. In this paper, we question this basic assumption and try to evaluate language models on their density modelling capabilities. Since a ground truth does not exist for the probability distribution of any natural language, we come up with a synthetic language made up of decimal numbers written in words in English. We train language models from scratch on various probability distributions over this synthetic language and compare the distributions learned by the models with the original distributions. Experiments show that language models can learn underlying probability distributions across a wide range of cases, but they fail when those distributions depend on deep semantic properties of numbers that cannot be inferred from syntactic patterns. Additionally, we observed a strong bias in the models towards numbers that frequently occur as substrings within other numbers. This suggests that such a bias possibly exists in real-world natural language models as well, and negatively impacts downstream tasks and analyses that rely on model-generated probabilities.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CLER: Improving Multimodal Financial Reasoning by Cross-MLLM Error Reflection

Shuangyan Deng
Zhongsheng Wang
Rui Mao
Ciprian Doru Giurcăneanu
Jiamou Liu

Recent advances in Multimodal Large Language Models (MLLMs) have enabled joint reasoning over financial textual and visual inputs. However, they still struggle with financial terminology, logical consistency, and numerical computations. Moreover, while commercial large models perform well on reasoning tasks, their high inference costs limit their scalable usage in real world financial applications. We thus propose a cost-effective framework, CLER, that combines contrastive retrieval with step-wise reflection to improve reasoning performance. Also, the reasoning cost is only generated in the test stage when using commercial large models. CLER leverages FinErrorSet, a dataset of 8,000+ mistake correction pairs from diverse open-source MLLMs. A fine grained retriever is trained to identify structurally relevant errors for self-correction through individual reflection. Experiments on three benchmarks show that CLER consistently outperforms other baselines. To our knowledge, CLER is the first framework to use cross-model errors for financial reasoning.

PDF Details DOI

AAAI Conference 2026 Conference Paper

LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models

Tiesunlong Shen
Rui Mao
Jin Wang
Heming Sun
Jian Zhang
Xuejie Zhang
Erik Cambria

Aligning Large Language Models (LLMs) with human preferences is critical, yet traditional fine-tuning methods are computationally expensive and inflexible. While test-time alignment offers a promising alternative, existing approaches often rely on distorted trajectory-level signals or inefficient sampling, fundamentally capping performance and failing to preserve the generative diversity of the base model. This paper introduces LLMdoctor, a novel framework for efficient test-time alignment that operates via a patient-doctor paradigm. It integrates token-level reward acquisition with token-level flow-guided preference optimization (TFPO) to steer a large, frozen patient LLM with a smaller, specialized doctor model. Unlike conventional methods that rely on trajectory-level rewards, LLMdoctor first extracts fine-grained, token-level preference signals from the patient model's behavioral variations. These signals then guide the training of the doctor model via TFPO, which establishes flow consistency across all subtrajectories, enabling precise token-by-token alignment while inherently preserving generation diversity. Extensive experiments demonstrate that LLMdoctor significantly outperforms existing test-time alignment methods and even surpasses the performance of full fine-tuning approaches like DPO.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MAPS: Multi-Agent Personality Shaping for Collaborative Reasoning

Jian Zhang
Zhiyuan Wang
Zhangqi Wang
Fangzhi Xu
Qika Lin
Lingling Zhang
Rui Mao
Erik Cambria

Collaborative reasoning with multiple agents offers the potential for more robust and diverse problem-solving. However, existing approaches often suffer from homogeneous agent behaviors and lack of reflective and rethinking capabilities. We propose Multi-Agent Personality Shaping ((MAPS), a novel framework that enhances reasoning through agent diversity and internal critique. Inspired by the Big Five personality theory, MAPS assigns distinct personality traits to individual agents, shaping their reasoning styles and promoting heterogeneous collaboration. To enable deeper and more adaptive reasoning, MAPS introduces a Critic agent that reflects on intermediate outputs, revisits flawed steps, and guides iterative refinement. This integration of personality-driven agent design and structured collaboration improves both reasoning depth and flexibility. Empirical evaluations across three benchmarks demonstrate the strong performance of MAPS, with further analysis confirming its generalizability across different large language models and validating the benefits of multi-agent collaboration.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RefleXNet: Targeted Self-Reflection for Accurate Chest X-ray Reporting

Xin Mei
Rui Mao
Xiaoyan Cai
Libin Yang
Erik Cambria

Automated interpretation and reporting of chest X-rays (CXRs) hold significant promise in reducing diagnostic errors and supporting radiologists under heavy clinical workloads. However, existing methods typically rely on global visual features and token-level supervision, limiting their sensitivity to subtle abnormalities and reducing their clinical reliability. To address these challenges, we present Reflective X-ray Network (RefleXNet), which systematically integrates multi-scale visual feature fusion and anatomical relational reasoning with a targeted self-reflective learning strategy. RefleXNet first constructs multi-scale visual representations and captures anatomical context through graph-based relational modeling. Building upon these representations, we introduce a targeted self-reflection strategy that uses clinically guided feedback from generated reports to selectively refine abnormality predictions and their associated region-level visual features. Extensive experiments on MIMIC-CXR demonstrate that RefleXNet consistently outperforms state-of-the-art baselines across clinical factual correctness metrics. Notably, our compact 3B-parameter model surpasses several recent models with over twice the parameter count. Additionally, RefleXNet exhibits strong generalization performance in zero-shot evaluations on IU-Xray compared with leading multimodal language models, highlighting its robustness and clinical effectiveness.

PDF Details DOI

EAAI Journal 2026 Journal Article

Towards smart city supervision: A detection pipeline for illegal buildings

Wenjin Liu
Yuheng Li
Shudong Zhang
Rui Mao

Illegal buildings pose a significant threat to urban planning and public safety. However, current satellite remote sensing-based detection methods not only require significant human and financial resources but also suffer from prolonged detection cycles. To address these challenges, a novel pipeline utilizing high-tower cameras for monitoring building-related objects to promptly discover illegal construction is proposed. First, a camera pose error self-correction algorithm is introduced to address the installation errors in camera installations, ensuring camera stability during patrol operations. Secondly, a mapping model between physical and image space is developed to locate objects. Furthermore, a previous illegal building object dataset is improved, and a new illegal building object detection network (IBDNet) is proposed. The proposed IBDNet introduces a Visual State Space block, a frequency-aware feature fusion module, and a feature fusion enhancement module, which are used for spatiotemporal modeling and multi-level feature fusion to enhance detection accuracy and robustness. Experimental results demonstrate that the proposed detection algorithm accurately detects illegal building-related objects, achieving a mean average precision (mAP) of 86. 1% and outperforming the existing state-of-the-art models. By utilizing the proposed detection pipeline, illegal buildings can be precisely detected and located, allowing for timely identification and prevention of illegal construction activities. Furthermore, this method can potentially be integrated into broader applications, such as real-time urban planning systems and smart urban governance.

IS Journal 2025 Journal Article

A Retrieval-Augmented Multiagent System for Financial Sentiment Analysis

Kelvin Du
Yazhi Zhao
Rui Mao
Frank Xing
Erik Cambria

Financial sentiment analysis (FSA) has seen substantial advancements with the use of large language models (LLMs). Previous research highlighted the effectiveness of retrieval-augmented generation (RAG) and multiagent LLMs for FSA as these approaches alleviate the problems of hallucination, a lack of factual knowledge, and limited complex problem-solving capability. Despite this, the interplay and potential synergies between these two methods remain largely unexplored. This study presents a notable leap forward by introducing a retrieval-augmented multiagent system (RAMAS) to enhance LLM-based FSA performance. An RAMAS is specifically designed to deepen understanding of the critical factors that are inherent in FSA and mimic human-like consensus-making processes by adaptively learning from semantically similar few-shot samples and engaging in conversations among the generator, discriminator, and arbitrator agents. Our evaluation of RAMASs demonstrates improved accuracy and F1-score across multiple established FSA benchmark datasets.

JBHI Journal 2025 Journal Article

Advanced Heterogeneous Network-Based Graph Neural Network Framework for Predicting Anti-CRISPR Protein Sequences

Yeqiang Wang
Wenxiao Zhao
Yijun He
Jiale Li
Rui Mao

Anti-CRISPR proteins play a crucial role in bacterial-phage interactions by inhibiting the CRISPR/Cas system and thus enhancing phage survival. Accurately predicting these proteins is essential for understanding phage-host immune interactions and progressing CRISPR/ Cas-based technologies. Current approaches primarily analyze proteins individually, which may overlook the intrinsic similarities and potential connections among protein sequences. This study introduces PACRGNN, a graph neural network framework that creates a heterogeneous protein network by integrating sequence and structural similarities, wherein nodes represent proteins and edges signify their relationships. By combining Graph Attention (GAT) and Graph Sample and Aggregation (GraphSAGE) layers, PACRGNN captures both local and global topological dependencies, while incorporating six protein feature categories to enrich node representations. PACRGNN achieves an accuracy of 0. 9577, an F1-Score of 0. 9572, and a PRAUC of 0. 9876 on the validation set. The model demonstrated superior performance to existing methods on the independent test set derived from NCBI database (Jan. –Oct. 2024).

TIST Journal 2025 Journal Article

Aspect-Enhanced Explainable Recommendation with Multi-modal Contrastive Learning

Hao Liao
Shuo Wang
Hao Cheng
Wei Zhang
Jiwei Zhang
Mingyang Zhou
Kezhong Lu
Rui Mao

Explainable recommender systems ( ERS ) aim to enhance users’ trust in the systems by offering personalized recommendations with transparent explanations. This transparency provides users with a clear understanding of the rationale behind the recommendations, fostering a sense of confidence and reliability in the system’s outputs. Generally, the explanations are presented in a familiar and intuitive way, which is in the form of natural language, thus enhancing their accessibility to users. Recently, there has been an increasing focus on leveraging reviews as a valuable source of rich information in both modeling user-item preferences and generating textual interpretations, which can be performed simultaneously in a multi-task framework. Despite the progress made in these review-based recommendation systems, the integration of implicit feedback derived from user-item interactions and user-written text reviews has yet to be fully explored. To fill this gap, we propose a model named SERMON (A s pect-enhanced E xplainable R ecommendation with M ulti-modal C o ntrast Lear n ing). Our model explores the application of multimodal contrastive learning to facilitate reciprocal learning across two modalities, thereby enhancing the modeling of user preferences. Moreover, our model incorporates the aspect information extracted from the review, which provides two significant enhancements to our tasks. Firstly, the quality of the generated explanations is improved by incorporating the aspect characteristics into the explanations generated by a pre-trained model with controlled textual generation ability. Secondly, the commonly used user-item interactions are transformed into user-item-aspect interactions, which we refer to as interaction triple, resulting in a more nuanced representation of user preference. To validate the effectiveness of our model, we conduct extensive experiments on three real-world datasets. The experimental results show that our model outperforms state-of-the-art baselines, with a 2.0% improvement in prediction accuracy and a substantial 24.5% enhancement in explanation quality for the TripAdvisor dataset.

NeurIPS Conference 2025 Conference Paper

Deciphering the Extremes: A Novel Approach for Pathological Long-tailed Recognition in Scientific Discovery

Zhe Zhao
Haibin Wen
Xianfu Liu
Rui Mao
Pengkun Wang
Liheng Yu
Linjiang Chen
Bo An

Scientific discovery across diverse fields increasingly grapples with datasets exhibiting pathological long-tailed distributions: a few common phenomena overshadow a multitude of rare yet scientifically critical instances. Unlike standard benchmarks, these scientific datasets often feature extreme imbalance coupled with a modest number of classes and limited overall sample volume, rendering existing long-tailed recognition (LTR) techniques ineffective. Such methods, biased by majority classes or prone to overfitting on scarce tail data, frequently fail to identify the very instances—novel materials, rare disease biomarkers, faint astronomical signals—that drive scientific breakthroughs. This paper introduces a novel, end-to-end framework explicitly designed to address pathological long-tailed recognition in scientific contexts. Our approach synergizes a Balanced Supervised Contrastive Learning (B-SCL) mechanism, which enhances the representation of tail classes by dynamically re-weighting their contributions, with a Smooth Objective Regularization (SOR) strategy that manages the inherent tension between tail-class focus and overall classification performance. We introduce and analyze the real-world ZincFluor chemical dataset ($\mathcal{T}=137. 54$) and synthetic benchmarks with controllable extreme imbalances (CIFAR-LT variants). Extensive evaluations demonstrate our method's superior ability to decipher these extremes. Notably, on ZincFluor, our approach achieves a Tail Top-2 accuracy of $66. 84\%$, significantly outperforming existing techniques. On CIFAR-10-LT with an imbalance ratio of $1000$ ($\mathcal{T}=100$), our method achieves a tail-class accuracy of $38. 99\%$, substantially leading the next best. These results underscore our framework's potential to unlock novel insights from complex, imbalanced scientific datasets, thereby accelerating discovery.

IJCAI Conference 2025 Conference Paper

Deduction with Induction: Combining Knowledge Discovery and Reasoning for Interpretable Deep Reinforcement Learning

Haodi Zhang
Xiangyu Zeng
Junyang Chen
Yuanfeng Song
Rui Mao
Fangzhen Lin

Deep reinforcement learning (DRL) has achieved remarkable success in dynamic decision-making tasks. However, its inherent opacity and cold start problem hinder transparency and training efficiency. To address these challenges, we propose HRL-ID, a neural-symbolic framework that combines automated rule discovery with logical reasoning within a hierarchical DRL structure. HRL-ID dynamically extracts first-order logic rules from environmental interactions, iteratively refines them through success-based updates, and leverages these rules to guide action execution during training. Extensive experiments on Atari benchmarks demonstrate that HRL-ID outperforms state-of-the-art methods in training efficiency and interpretability, achieving higher reward rates and successful knowledge transfer between domains.

PDF Details DOI

ICRA Conference 2025 Conference Paper

Effective Heterogeneous Point Cloud-Based Place Recognition and Relative Localization for Ground and Aerial Vehicles

Rui Mao
Hui Cheng

Place recognition and relative localization are crucial for realizing the potential of collaboration in ground and aerial robot teams. Many existing works focus only on ground robots and are not well-suited for heterogeneous robot systems in large-scale environments. In this paper, we propose a novel pipeline based on BEV density image, combined with an enhanced data structure, for place recognition in air-ground robotic collaboration systems. An efficient height alignment algorithm is proposed for relative localization. Extensive experiments on various types of public datasets validate the efficacy of our method compared to other SOTA works. We also show that our method is capable to detect inter- and intra-robot loop closures in a ground and aerial multi-session SLAM system.

IS Journal 2025 Journal Article

Explicable Artificial Intelligence for Affective Computing

Rui Mao
Erik Cambria
Yang Li
Newton Howard

Artificial intelligence (AI) is increasingly tasked with recognizing and responding to human emotions, making affective computing one of its most consequential frontiers. As AI spreads into finance, policymaking, and mental health, the opacity of deep learning models raises urgent challenges for trust, accountability, and ethics. This special issue addresses explicability not just as algorithmic transparency, but as a paradigm integrating cognitive science, the humanities, and ethical foresight with technical innovation. Guided by the “Seven Pillars for the Future of AI”— multidisciplinarity, task decomposition, parallel analogy, symbol grounding, similarity measure, intention awareness, and trustworthiness—it envisions affective AI as a partner in meaning-making rather than a mere inference engine. The six featured articles span topics from depression detection and sentiment analysis to hate speech moderation and interpretable driving behaviors, advancing affective AI that is accurate, interpretable, and aligned with human dignity.

NeurIPS Conference 2025 Conference Paper

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Yunlong Tang
Pinxin Liu
Mingqian Feng
Zhangyun Tan
Rui Mao
Chao Huang
Jing Bi
Yunzhong Xiao

Understanding perspective is fundamental to human visual perception, yet the extent to which multimodal large language models (MLLMs) internalize perspective geometry remains unclear. We introduce MMPerspective, the first benchmark specifically designed to systematically evaluate MLLMs' understanding of perspective through 10 carefully crafted tasks across three complementary dimensions: Perspective Perception, Reasoning, and Robustness. Our benchmark comprises 2, 711 real-world and synthetic image instances with 5, 083 question-answer pairs that probe key capabilities, such as vanishing point perception and counting, perspective type reasoning, line relationship understanding in 3D space, invariance to perspective-preserving transformations, etc. Through a comprehensive evaluation of 43 state-of-the-art MLLMs, we uncover significant limitations: while models demonstrate competence on surface-level perceptual tasks, they struggle with compositional reasoning and maintaining spatial consistency under perturbations. Our analysis further reveals intriguing patterns between model architecture, scale, and perspective capabilities, highlighting both robustness bottlenecks and the benefits of chain-of-thought prompting. MMPerspective establishes a valuable testbed for diagnosing and advancing spatial understanding in vision-language systems. Resources are available at https: //yunlong10. github. io/MMPerspective/

IROS Conference 2025 Conference Paper

Seamless Transition Control in Spring-Legged Quadrotors: A Hybrid Dynamics Perspective with Guaranteed Feasibility

Hongli Li
Botao Zhang
Rui Mao
Tao Wang
Hui Cheng

Legged aerial-terrestrial robots have garnered significant research attention in recent years due to their enhanced environmental adaptability through combined aerial and terrestrial locomotion. However, existing passive spring-legged aerial robots exhibit limited motion versatility, demonstrating single stance gait during ground impacts, which constrains their task adaptability and creates substantial challenges in hybrid trajectory optimization and switching control. To address these difficulties, this work presents a systematic solution to achieve diverse hybrid locomotion. We innovatively establish the differential flatness property for spring-legged quadrotors in both aerial and terrestrial domains, and propose a unified hybrid trajectory optimization framework that generates smooth, agile, and dynamically feasible multi-modal trajectories incorporating diverse stance gait patterns. Furthermore, a hybrid nonlinear model predictive controller with a trajectory extension strategy is developed to enhance hybrid tracking precision and mode transition execution. Compared to existing methods, we achieve a 27% reduction in tracking error during hybrid locomotion while maintaining high-precision foot placement. The source code will be released to benefit the community 1

NeurIPS Conference 2025 Conference Paper

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang
Yanrui Yu
Ye Yuan
Rui Mao
Tianfei Zhou

Reinforcement fine-tuning (RFT) has shown great promise in achieving humanlevel reasoning capabilities of Large Language Models (LLMs), and has recently been extended to MLLMs. Nevertheless, reasoning about videos, which is a fundamental aspect of human intelligence, remains a persistent challenge due to the complex logic, temporal and causal structures inherent in video data. To fill this gap, we propose VideoRFT, a novel approach that extends the RFT paradigm to cultivate human-like video reasoning capabilities in MLLMs. VideoRFT follows the standard two-stage scheme in RFT: supervised fine-tuning (SFT) with chain-of-thought (CoT) annotations, followed by reinforcement learning (RL) to improve generalization. A central challenge to achieve this in the video domain lies in the scarcity of large-scale, high-quality video CoT datasets. We address this by building a multi-expert-driven, cognition-inspired CoT curation pipeline. First, we devise a cognition-inspired prompting strategy to elicit a reasoning LLM to generate preliminary CoTs based solely on rich, structured, and literal representations of video content. Subsequently, these CoTs are revised by a MLLM conditioned on the actual video, ensuring visual consistency and reducing visual hallucinations. This pipeline results in two new datasets, i. e. VideoRFT-CoT-102K for SFT and VideoRFT-RL-310K for RL. To further strengthen the RL phase, we introduce a novel semantic-consistency reward that explicitly promotes the alignment between textual reasoning and visual evidence. This reward encourages the model to produce coherent, context-aware reasoning outputs grounded in visual input. Extensive experiments show that VideoRFT achieves state-of-the-art performance on six video reasoning benchmarks.

IJCAI Conference 2024 Conference Paper

Modeling Personalized Retweeting Behaviors for Multi-Stage Cascade Popularity Prediction

Mingyang Zhou
Yanjie Lin
Gang Liu
Zuwen Li
Hao Liao
Rui Mao

Predicting the size of message cascades is critical in various applications, such as online advertising and early detection of rumors. However, most existing deep learning approaches rely on cascade observation, which hinders accurate cascade prediction before message posting. Besides, these approaches overlook personalized retweeting behaviors that reflect users' inclination to retweeting specific types of information. In this study, we propose a universal cascade prediction framework, namely Cascade prediction regarding Multiple Stage (CasMS), that effectively predicts cascade popularity across message generation stage as well as short-term and long-term stages. Unlike previous methods, our approach not only captures users' personalized retweeting behaviors but also incorporates temporal cascade features. We perform the experiments in datasets collected ourselves as well as public datasets. The results show that our method significantly surpasses existing approaches in predicting the cascade during the message generation stage and different time periods in the cascade dynamics.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Motif-oriented influence maximization for viral marketing in large-scale social networks

Mingyang Zhou
Weiji Cao
Hao Liao
Rui Mao

The influence maximization (IM) problem aims to identify a budgeted set of nodes with the highest potential to influence the largest number of users in a cascade model, a key challenge in viral marketing. Traditional \emph{IM} approaches consider each user/node independently as a potential target customer. However, in many scenarios, the target customers comprise motifs, where activating only one or a few users within a motif is insufficient for effective viral marketing, which, nevertheless, receives little attention. For instance, if a motif of three friends planning to dine together, targeting all three simultaneously is crucial for a restaurant advertisement to succeed. In this paper, we address the motif-oriented influence maximization problem under the linear threshold model. We prove that the motif-oriented IM problem is NP-hard and that the influence function is neither supermodular nor submodular, in contrast to the classical \emph{IM} setting. To simplify the problem, we establish the submodular upper and lower bounds for the influence function. By leveraging the submodular property, we propose a natural greedy strategy that simultaneously maximizes both bounds. Our algorithm has an approximation ratio of $\tau\cdot (1-1/e-\varepsilon)$ and a near-linear time complexity of $O((k+l)(m+\eta)\log \eta/\varepsilon^2)$. Experimental results on diverse datasets confirm the effectiveness of our approach in motif maximization.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Understanding Public Perception Towards Weather Disasters Through the Lens of Metaphor

Rui Mao
Qika Lin
Qiawen Liu
Gianmarco Mengaldo
Erik Cambria

Extreme weather can lead to weather-induced disasters. These have a profound impact on communities worldwide, causing loss of life, damage to properties and infrastructure, and disruption of daily activities. In alignment with the United Nations Sustainable Development Goals, addressing the increasing frequency and severity of these events, exacerbated by climate change, is imperative. Exploring public perception and responses to weather disasters becomes crucial for policymakers to formulate effective strategies that not only mitigate the impacts but also contribute to the goal of ensuring sustainable and resilient communities. Social media, as a pervasive and real-time communication platform, has gathered a large amount of public opinion. In this work, we analyze public perception towards weather disasters based on tweets and metaphors. Metaphor, as a linguistic device, plays a pivotal role in unraveling cognitive processes and understanding how individuals perceive and make sense of concepts. We focus on tweets related to four distinct types of weather disasters i. e. , floods, hurricanes, tornadoes, and wildfires, aiming to extract nuanced insights regarding public perceptions, concerns, and attitudes towards these specific events. We also deliver constructive recommendations, based on the insights.

PDF Details DOI

IS Journal 2023 Journal Article

Seven Pillars for the Future of Artificial Intelligence

Erik Cambria
Rui Mao
Melvin Chen
Zhaoxia Wang
Seng-Beng Ho

In recent years, artificial intelligence (AI) research has showcased tremendous potential to positively impact humanity and society. Although AI frequently outperforms humans in tasks related to classification and pattern recognition, it continues to face challenges when dealing with complex tasks such as intuitive decision making, sense disambiguation, sarcasm detection, and narrative understanding as these require advanced kinds of reasoning, e. g. , common-sense reasoning and causal reasoning, which have not been emulated satisfactorily yet. To address these shortcomings, we propose seven pillars that we believe represent the key hallmark features for the future of AI, namely, multidisciplinarity, task decomposition, parallel analogy, symbol grounding, similarity measure, intention awareness, and trustworthiness.

AAAI Conference 2023 Conference Paper

SKIER: A Symbolic Knowledge Integrated Model for Conversational Emotion Recognition

Wei Li
Luyao Zhu
Rui Mao
Erik Cambria

Emotion recognition in conversation (ERC) has received increasing attention from the research community. However, the ERC task is challenging, largely due to the complex and unstructured properties of multi-party conversations. Besides, the majority of daily dialogues take place in a specific context or circumstance, which requires rich external knowledge to understand the background of a certain dialogue. In this paper, we address these challenges by explicitly modeling the discourse relations between utterances and incorporating symbolic knowledge into multi-party conversations. We first introduce a dialogue parsing algorithm into ERC and further improve the algorithm through a transfer learning method. Moreover, we leverage different symbolic knowledge graph relations to learn knowledge-enhanced features for the ERC task. Extensive experiments on three benchmarks demonstrate that both dialogue structure graphs and symbolic knowledge are beneficial to the model performance on the task. Additionally, experimental results indicate that the proposed model surpasses baseline models on several indices.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Explainable Metaphor Identification Inspired by Conceptual Metaphor Theory

Mengshi Ge
Rui Mao
Erik Cambria

Metaphor is not only a linguistic phenomenon but also reflects the concept projection between source and target domains in human cognition. Previous sequence tagging-based metaphor identification methods could not model the concept projection, resulting in a limitation that the outputs of these models are unexplainable in the predictions of the metaphoricity labels. In this work, we propose the first explainable metaphor identification model, inspired by Conceptual Metaphor Theory. The model is based on statistic learning, a lexical resource, and a novel reward mechanism. Our model can identify the metaphoricity on the word-pair level, and explain the predicted metaphoricity labels via learned concept mappings. The use of the reward mechanism allows the model to learn the optimal concept mappings without knowing their true labels. Our method is also applicable for the concepts that are out of training domains by using the lexical resource. The automatically generated concept mappings demonstrate the implicit human thoughts in metaphoric expressions. Our experiments show the effectiveness of the proposed model in metaphor identification, and concept mapping tasks, respectively.

TIST Journal 2021 Journal Article

A Dynamic Convolutional Neural Network Based Shared-Bike Demand Forecasting Model

Shaojie Qiao
Nan Han
Jianbin Huang
Kun Yue
Rui Mao
Hongping Shu
Qiang He
Xindong Wu

Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF, is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.

AAAI Conference 2021 Conference Paper

Bridging Towers of Multi-task Learning with a Gating Mechanism for Aspect-based Sentiment Analysis and Sequential Metaphor Identification

Rui Mao
Xiao Li

Multi-task learning (MTL) has been widely applied in Natural Language Processing. A major task and its associated auxiliary tasks share the same encoder; hence, an MTL encoder can learn the sharing abstract information between the major and auxiliary tasks. Task-specific towers are then employed upon the sharing encoder to learn task-specific information. Previous works demonstrated that exchanging information between task-specific towers yielded extra gains. This is known as soft-parameter sharing MTL. In this paper, we propose a novel gating mechanism for the bridging of MTL towers. Our method is evaluated based on aspect-based sentiment analysis and sequential metaphor identification tasks. The experiments demonstrate that our method can yield better performance than the baselines on both tasks. Based on the same Transformer backbone, we compare our gating mechanism with other information transformation mechanisms, e. g. , cross-stitch, attention and vanilla gating. The experiments show that our method also surpasses these baselines.