Author name cluster

Xiangyu Kong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

EAAI Journal 2026 Journal Article

Distribution adversarial gating enhanced prediction model for carbon emission with multi-agent automated modeling framework

Qingyang Wang
Piaoyang Zhao
Chengxi She
Yang Chen
Xiangyu Kong
Xuedong Wang
Caihua Chen

To address these limitations of tough prediction of data under different distributions and high cost of modeling and coding complex customized prediction model, this paper proposes a unified approach integrating four crucial components: customized feature processor, multi-stage carbon measurement model, distribution adversarial gating (DAG) enhanced prediction model, and multi-agent automated modeling framework. Firstly, we construct a comprehensive knowledge base containing various carbon emission factors and standards together with external carbon-related data portals for retrieval. Secondly, we propose a multi-stage carbon measurement model based on knowledge base constructed to generate accurate carbon emission labels for prediction model training. Thirdly, we propose DAG enhanced Long Short-Term Memory Neural Network (DAG-LSTM), which ensures favorable prediction of pre-trained models on different test data under different distributions. Lastly, we design a multi-agent framework leveraging Large Language Models (LLMs) for automated modeling and coding, which significantly reduces the technical barriers to application. We evaluate our approach using real-world power grid datasets from 2021–2024. The results demonstrate that our automated modeling framework achieves implementing carbon emission prediction with only a few simple instructions and DAG-LSTM reduces prediction errors with different data distribution by at least 69. 3% and at most 92. 6%. Our work provides both a novel prediction architecture and an intelligent modeling paradigm, contributing to scalable, accurate, and accessible carbon emission prediction in diverse industrial scenarios.

Details DOI

AAAI Conference 2026 Conference Paper

Explainable Depression Assessment from Face Videos by Weakly Supervised Learning

Rongfan Liao
Xiangyu Kong
Shiqing Tang
Lang He
Changzeng Fu
Weicheng Xie
Xiaofeng Liu
Lu Liu

Existing video-based automatic depression assessment (ADA) approaches frequently achieve video-level depression assessment by aggregating features or predictions of individual frames or equal-length segments within the given video. While their performances have been largely enhanced by recent advanced deep learning models, they typically fail to explicitly consider the varied importance of depression-related behavioural cues across different video segments, i.e., segments within one video may contain behaviours reflecting varying levels of depression. Underestimating segment-level variations can obscure the detection of facial behaviour cues associated with depression, thereby undermining the accuracy and interpretability of video-based depression detection systems. In this paper, we propose a novel video-based ADA approach that specifically identifies and differentiates video segments that exhibit depression-related facial behaviours across varying temporal durations, providing clear insights into how each segment contributes to the video-level depression prediction. To achieve this, a novel weakly supervised strategy is proposed to compare segment-level behaviours with video-level depression label, enabling the model to assign depression-relevant scores to multiple temporal scale video segments and attend selectively to those most indicative of depressive states. Extensive experiments on the AVEC 2013 and AVEC 2014 face video depression datasets demonstrate the effectiveness of our approach.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learning Personalised Human Internal Cognition from External Expressive Behaviours for Real Personality Recognition

Xiangyu Kong
Hengde Zhu
Haoqin Sun
Zhihao Guo
Jiayan Gu
Xinyi Ni
Wei Zhang
Shizhe Liu

Automatic real personality recognition (RPR) aims to evaluate human real personality traits from their expressive behaviours. However, most existing solutions generally act as external observers to infer observers' personality impressions based on target individuals' expressive behaviours, which significantly deviate from their real personalities and consistently lead to inferior recognition performance. Inspired by the association between real personality and human internal cognition underlying the generation of expressive behaviours, we propose a novel RPR approach that efficiently simulates personalised internal cognition from external short audio-visual behaviours expressed by target individual. The simulated personalised cognition, represented as a set of network weights that enforce the personalised network to reproduce the individual-specific facial reactions, is further encoded as a graph containing two-dimensional node and edge feature matrices, with a novel 2D Graph Neural Network (2D-GNN) proposed for inferring real personality traits from it. To simulate real personality-related cognition, an end-to-end (E2E) strategy is designed to jointly train our cognition simulation, 2D graph construction, and personality recognition modules. Experiments show our approach’s effectiveness in capturing real personality traits with superior computational efficiency.

PDF Details DOI

EAAI Journal 2026 Journal Article

Lightweight Kolmogorov-Arnold Network with dual-objective optimization for axial capacity prediction of square coal gangue concrete-filled steel tube stub columns based on finite element simulation

Xiangyu Kong
Yaowei Fan
Jinlong Liu
Meng Xi
Yuzhuo Zhang
Yang Yu

Square coal gangue concrete-filled steel tube (CGCFST) stub columns offer both environmental benefits from solid waste utilization and enhanced mechanical performance from steel tube confinement. However, accurate prediction of their axial compressive bearing capacity (N u ) remains challenging due to the absence of dedicated design codes and limitations of existing machine learning approaches. Specifically, tree-based models yield piecewise constant predictions incapable of capturing the smooth continuous parameter relationship, while conventional neural networks struggle to balance model compactness with prediction accuracy. To address these gaps, this study proposes a dual-objective optimized Kolmogorov-Arnold Network (KAN) framework that simultaneously minimizes root mean square error (RMSE) and model complexity (edge function count). A parametric database of 1470 samples was generated via ABAQUS batch simulations, covering five key design parameters: cross-section side length (D), steel tube thickness (t), coal gangue replacement ratio (r), concrete compressive strength (f c '), and steel yield strength (f y ). Six existing design codes were systematically evaluated, revealing significant errors when applied to CGCFST. The optimized KAN achieves a coefficient of determination (R 2 ) of 0. 9993, outperforming Multilayer Perceptron (MLP) variants and all other Bayesian-optimized tree models except Extreme Gradient Boosting (XGBoost), while providing inherently continuous predictions. After two pruning iterations, KAN retains high accuracy (R 2 = 0. 9713) with only 2. 8% degradation, confirming its lightweight potential. Sensitivity and parametric analyses quantify the contribution of each design parameter to N u. Finally, a Graphical User Interface (GUI) integrating ABAQUS batch simulation with KAN prediction is developed to facilitate rapid and accurate capacity assessment for engineering applications.

Details DOI

EAAI Journal 2025 Journal Article

An interpretable privacy-preserving real-time carbon emission estimation approach for heterogeneous industrial enterprises

Xiangyu Kong
Riwei Zhang
Bixuan Gao
Gaohua Liu
Kaijie Fang
Meimei Duan

Accurate monitoring of carbon emissions from high energy-consuming enterprises is foundational to the low-carbon development of the power system. The volume trend and energy consumption structure of diverse industrial enterprises are different. Most current carbon emission accounting relies on annual energy consumption statistics, with one year or more lag. Additionally, there are severe data barriers between various enterprises, considering the consequences of protecting their carbon-related data. This paper proposes an innovative carbon emission monitoring method to address the aforementioned issues. The first step of this method is to construct an accurate and high-frequency carbon emission measurement architecture from the multi-dimensional perspective of data flow and carbon emission flow. Next, three pivotal techniques are employed in hybrid model: variational mode decomposition (VMD), temporal convolutional network (TCN), long short-term memory network with multi-head attention (LSTMMA)—collectively referred to as VMD-TCN-LSTMMA model. The proposed model can decompose data, increase dimension, and extract features of power time series to precisely estimate the direct carbon emissions of industrial enterprises. Moreover, a model training and information sharing framework based on differential privacy federated score weight algorithm (DP-FedSW) is developed to improve monitoring accuracy for enterprises without divulging raw data. A dataset from 142 users in 6 different types of high energy-consuming industries is collected to comprehensively evaluate the proposed monitoring method's performance. Experimental results demonstrate that the proposed method outperforms the conventional strategy, which can improve the stability and effectiveness of the carbon emission estimation model, ensuring the security of raw data.

Details DOI

EAAI Journal 2025 Journal Article

Integrating cumulative binomial probability into artificial bee colony algorithm for global optimization in mechanical engineering design

Xiangyu Kong
Pengpeng Shang
Chunfeng Wang
Lixia Liu

The artificial bee colony (ABC) algorithm has gained significant attention in engineering design optimization for its simplicity and robustness. However, it suffers from low convergence accuracy and imbalanced exploration and exploitation, particularly in non-convex and multi-modal problems. To alleviate these shortcomings, we present a novel ABC algorithm that combines cumulative binomial probability (CBABC), which contains two versions, single-dimensional evolution (CBACB_S) and multi-dimensional evolution (CBABC_M). According to the success and failure evolution experience, we first introduce a scaling factor based on cumulative binomial probability. Sequentially new movement equations with different characteristics are designed in the onlooker bee phase to balance local and global search capabilities. Then, a novel abandoned solution update mechanism is defined during scout bee phase to partly improve the solution accuracy. Finally, our approaches achieve a minimum winning rate of 68. 19% and a maximum winning rate of 95. 45% against its 9 outstanding ABC variants across the 22 functions. In addition, the proposed algorithms maintain a winning rate within [57. 89%, 100%] compared to several state-of-the-art evolutionary algorithms for tackling 19 real-world mechanical engineering problems from Congress on Evolutionary Computation 2020 (CEC2020) suite.

Details DOI

AAAI Conference 2025 Conference Paper

PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation

Hengde Zhu
Xiangyu Kong
Weicheng Xie
Xin Huang
Xilin He
Lu Liu
Linlin Shen
Wei Zhang

In dyadic human-human interactions, individuals may express multiple different facial reactions in response to the same/similar behaviours expressed by their conversational partners depending on their personalised behaviour patterns. As a result, frequently-employed reconstruction loss-based strategies lead the training of previous automatic facial reaction generation (FRG) models to not only suffer from the 'one-to-many mapping' problem, but also fail to comprehensively consider the quality of the generated facial reactions. Besides, none of them considered such personalised behaviour patterns in generating facial reactions. In this paper, we propose the first adversarial FRG model training strategy which jointly learns appropriateness and realism discriminators to provide comprehensive task-specific supervision for training the target facial reaction generators, and reformulates the 'one-to-many (facial reactions) mapping' training problem as a 'one-to-one (distribution) mapping' training task, i.e., the FRG model is trained to output a distribution representing multiple appropriate/plausible facial reaction from each input human behaviour. In addition, our approach also serves as the first offline FRG approach that considers personalised behaviour patterns in generating of target individuals' facial reactions. Experiments show that our PerReactor not only largely outperformed all existing offline solutions for generating more appropriate, diverse and realistic facial reactions, but also is the first approach that can effectively generate personalised appropriate facial reactions.

PDF Details DOI

ICLR Conference 2024 Conference Paper

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Siyuan Qi
Shuo Chen 0006
Yexin Li
Xiangyu Kong
Junqi Wang
Bangcheng Yang
Pring Wong
Yifan Zhong

The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization’s profound alignment with human society requires sophisticated learning and prior knowledge, while its ever-changing space and action space demand robust reasoning for generalization. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.

Details

NeurIPS Conference 2024 Conference Paper

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Zhenyu Guan
Xiangyu Kong
Fangwei Zhong
Yizhou Wang

Diplomacy is one of the most sophisticated activities in human society, involving complex interactions among multiple parties that require skills in social reasoning, negotiation, and long-term strategic planning. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. While recent agents based on large language models (LLMs) have shown potential in various applications, they still struggle with extended planning periods in complex multi-agent settings. Leveraging recent technologies for LLM-based agents, we aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions by integrating three fundamental capabilities: 1) strategic planning with memory and reflection; 2) goal-oriented negotiation with social reasoning; and 3) augmenting memory through self-play games for self-evolution without human in the loop. Project page: https: //sites. google. com/view/richelieu-diplomacy.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Pose-Assisted Multi-Camera Collaboration for Active Object Tracking

Jing Li
Jing Xu
Fangwei Zhong
Xiangyu Kong
Yu Qiao
Yizhou Wang

Active Object Tracking (AOT) is crucial to many visionbased applications, e. g. , mobile robot, intelligent surveillance. However, there are a number of challenges when deploying active tracking in complex scenarios, e. g. , target is frequently occluded by obstacles. In this paper, we extend the single-camera AOT to a multi-camera setting, where cameras tracking a target in a collaborative fashion. To achieve effective collaboration among cameras, we propose a novel Pose- Assisted Multi-Camera Collaboration System, which enables a camera to cooperate with the others by sharing camera poses for active object tracking. In the system, each camera is equipped with two controllers and a switcher: The vision-based controller tracks targets based on observed images. The pose-based controller moves the camera in accordance to the poses of the other cameras. At each step, the switcher decides which action to take from the two controllers according to the visibility of the target. The experimental results demonstrate that our system outperforms all the baselines and is capable of generalizing to unseen environments. The code and demo videos are available on our website https: //sites. google. com/view/pose-assistedcollaboration.

PDF Details

AAAI Conference 2016 Conference Paper

Robust Complex Behaviour Modeling at 90Hz

Xiangyu Kong
Yizhou Wang
Tao Xiang

Modeling complex crowd behaviour for tasks such as rare event detection has received increasing interest. However, existing methods are limited because (1) they are sensitive to noise often resulting in a large number of false alarms; and (2) they rely on elaborate models leading to high computational cost thus unsuitable for processing a large number of video inputs in real-time. In this paper, we overcome these limitations by introducing a novel complex behaviour modeling framework, which consists of a Binarized Cumulative Directional (BCD) feature as representation, novel spatial and temporal context modeling via an iterative correlation maximization, and a set of behaviour models, each being a simple Bernoulli distribution. Despite its simplicity, our experiments on three benchmark datasets show that it signiﬁcantly outperforms the state-of-the-art for both temporal video segmentation and rare event detection. Importantly, it is extremely efﬁcient — reaches 90Hz on a normal PC platform using MAT- LAB.

PDF Details