Author name cluster

Hao Guo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

AAAI Conference 2026 Conference Paper

Adaptive Theory of Mind for LLM-based Multi-Agent Coordination

Chunjiang Mu
Ya Zeng
Qiaosheng Zhang
Kun Shao
Chen Chu
Hao Guo
Danyang Jia
Zhen Wang

Theory of Mind (ToM) refers to the ability to reason about others’ mental states, and higher-order ToM involves considering that others also possess their own ToM. Equipping large language model (LLM)-driven agents with ToM has long been considered to improve their coordination in multiagent collaborative tasks. However, we find that misaligned ToM orders—mismatches in the depth of ToM reasoning between agents—can lead to insufficient or excessive reasoning about others, thereby impairing their coordination. To address this issue, we design an adaptive ToM (A-ToM) agent, which can align in ToM orders with its partner. Based on prior interactions, the agent estimates the partner’s likely ToM order and leverages this estimation to predict the partner’s action, thereby facilitating behavioral coordination. We conduct empirical evaluations on four multi-agent coordination tasks: a repeated matrix game, two grid navigation tasks and an Overcooked task. The results validate our findings on ToM alignment and demonstrate the effectiveness of our AToM agent. Furthermore, we discuss the generalizability of our A-ToM to non-LLM-based agents, as well as what would diminish the importance of ToM alignment.

PDF Details DOI

JBHI Journal 2026 Journal Article

CFRAFN: A Cross-Feature Residual Attention Fusion Network for Major Depressive Disorder Prediction Using Clinical Voice Recordings

Rumo Pan
Sidu Feng
Yi Sun
Jinqiu Xu
Tianzhang Zhai
Xiaochun Wu
Liangliang Tan
Yonggui Yuan

Major depressive disorder (MDD) is a prevalent mental disorder with a significant burden on individuals and society, and timely identification and intervention are essential for effective management. Voice data have been used as behavioral indicators of MDD, offering valuable insights into an individual's mental state. In this study, we collected voice data from 221 patients diagnosed with MDD at the inpatient ward of the Department of Psychiatry and Psychosomatics, Zhongda Hospital, Southeast University, alongside 113 healthy controls, to construct the Chinese depressive voice dataset. We proposed the cross-feature residual attention fusion network (CFRAFN), which leverages extended Geneva minimalistic acoustic parameter set features along with high-dimensional embeddings extracted from the pretrained VGGish model to effectively capture MDD-associated phonetic patterns. Specifically, CFRAFN utilizes differentiated residual blocks to maintain training stability in deep hierarchical structure. Furthermore, the self-attention fusion strategy dynamically weighted the significance of each feature modality, ensuring effective feature integration and consequently improving MDD prediction accuracy. Experimental results demonstrated that CFRAFN achieved an excellent predictive performance with an area under the receiver operating characteristic curve of 0. 924 in an independent test set, and significantly outperformed 11 baseline models across 5-fold cross-validation.

Details DOI

AAAI Conference 2025 Conference Paper

Each Fake News Is Fake in Its Own Way: An Attribution Multi-Granularity Benchmark for Multimodal Fake News Detection

Hao Guo
Zihan Ma
Zhi Zeng
Minnan Luo
Weixin Zeng
Jiuyang Tang
Xiang Zhao

Social platforms, while facilitating access to information, have also become saturated with a plethora of fake news, resulting in negative consequences. Automatic multimodal fake news detection is a worthwhile pursuit. Existing multimodal fake news datasets only provide binary labels of real or fake. However, real news is alike, while each fake news is fake in its own way. These datasets fail to reflect the mixed nature of various types of multimodal fake news. To bridge the gap, we construct an attributing multi-granularity multimodal fake news detection dataset AMG, revealing the inherent fake pattern. Furthermore, we propose a multi-granularity clue alignment model MGCA to achieve multimodal fake news detection and attribution. Experimental results demonstrate that AMG is a challenging dataset, and its attribution setting opens up new avenues for future research.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

HARMONY: A Privacy-preserving and Sensor-agnostic Tele-monitoring system

Qipeng Xie
Hao Guo
Weizheng Wang
Yongzhi Huang
Linshan Jiang
Jiafei Wu
Shuxin Zhong
Lu Wang

Global aging necessitates tele-monitoring systems to provide real-time tracking and timely assistance for older adults living independently. While pervasive wireless devices (e. g. , CSI, IMU, UWB) enable cost-effective, non-intrusive monitoring, existing systems lack flexibility, limiting their adaptability to different environments. In this work, we posit that the motion dynamics of human movement are invariant across sensing modalities, inspiring the design of HARMONY—a privacy-preserving, sensor-agnostic system that supports multi-modal inputs and diverse tele-monitoring tasks. HARMONY incorporates Modality-agnostic Data Processing to uniformly encrypt multi-modal signals and Task-specific Activity Recognition for seamless tasks adaptation. A novel Encrypted-processing Engine then significantly accelerates computations on encrypted data by optimizing matrix and convolution operations. Evaluations across five different sensing modalities show that HARMONY consistently achieves high accuracy while delivering 3. 5 × to 130 × speedups over state-of-the-art baselines. Our results demonstrate that HARMONY is a practical, scalable, and privacy-centric prototype for next-generation remote healthcare.

PDF Details DOI

YNIMG Journal 2025 Journal Article

Self-organizing dynamic research based on phase coherence graph autoencoders: Analysis of brain metastable states across the lifespan

Hao Guo
Yu-Xuan Liu
Yao Li
Qi-Li Guo
Zhi-Peng Hao
Yan-Li Yang
Jing Wei

The development of the human brain is a complex, lifelong process during which collective behaviors of neurons exhibit self-organizing dynamics. Metastable states play a crucial role in understanding the complex dynamical mechanisms of the brain, and analyzing them helps to reveal the mechanisms of functional changes in the brain throughout development and aging. Specifically, global metastable state provides a overall perspective on the flexibility of brain reorganization, while the evolution trajectories of transient functional patterns capture detailed changes in brain activity. The leading eigenvector dynamics analysis (LEiDA) method significantly reduces the dimensionality of data and is widely used to capture the temporal trajectory characteristics of transient functional patterns, i.e., metastable brain states. However, LEiDA's linear dimensionality reduction of high-dimensional raw brain data may overlook non-linear information and lose some relationships between features. We developed a framework based on Phase Coherence Graph Autoencoder (PCGAE) that employs graph autoencoders (GAE) for non-linear dimensionality reduction of phase coherence matrices. This approach clusters to identify more distinct metastable brain states and is applied to the analysis of resting-state functional magnetic resonance imaging (rs-fMRI) data across the human lifespan. This paper investigates age-related differences and continuity changes from different perspectives: metastable state indicators and state trajectory indicators (occurrence probability, lifetime, and state transition metrics). Global metastable state shows a linear decline with age, while both linear and quadratic effects of age-related changes are observed in detailed state metastable and state trajectory indicators. Finally, the proposed feature extraction scheme demonstrates good classification performance for categorizing brain age groups. These findings can help us understand the self-organizing reorganization characteristics associated with aging and their complex dynamic changes, providing new insights into brain development throughout the entire lifespan.

Details DOI

IROS Conference 2025 Conference Paper

TBAP: Tapping-Based Auditory Perception for Identifying Container Materials

Zehao Li
Shoujie Li
Hao Guo
Wenbo Ding

In this study, in order to address the robotic auditory perception problem, we propose a novel framework for object material recognition of common containers, which combines deep learning with active auditory perception to achieve breakthrough results. We developed a modular robotic system for acoustic data acquisition that employs a hybrid mechanism of vertical translation and horizontal rotation that is capable of performing full-scale tapping in three dimensions. The system is capable of creating an acoustic dataset consisting of 50 containers made of five materials, which improves the data acquisition efficiency by 93. 9% compared to manual operations. In addition, we propose an end-to-end transfer learning model, TBAP, which is trained on a crawler-generated pre-training dataset and 50 real scene samples, and achieves a recognition accuracy of 91. 0% for unseen materials. To improve reliability, we design a dynamic confidence assessment mechanism that generates confidence indices through probability distribution analysis and feature stability assessment to support robust robot decision-making. Experimental results show that the framework greatly improves data acquisition efficiency while maintaining high recognition accuracy, providing a valuable tool for advancing acoustic perception research.

Details

ICML Conference 2025 Conference Paper

TUMTraf VideoQA: Dataset and Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes

Xingcheng Zhou
Konstantinos Larintzakis
Hao Guo
Walter Zimmer
Mingyu Liu
Hu Cao
Jiajie Zhang
Venkatnarayanan Lakshminarasimhan

We present TUMTraf VideoQA, a novel dataset and benchmark designed for spatio-temporal video understanding in complex roadside traffic scenarios. The dataset comprises 1, 000 videos, featuring 85, 000 multiple-choice QA pairs, 2, 300 object captioning, and 5, 700 object grounding annotations, encompassing diverse real-world conditions such as adverse weather and traffic anomalies. By incorporating tuple-based spatio-temporal object expressions, TUMTraf VideoQA unifies three essential tasks—multiple-choice video question answering, referred object captioning, and spatio-temporal object grounding—within a cohesive evaluation framework. We further introduce the TraffiX-Qwen baseline model, enhanced with visual token sampling strategies, providing valuable insights into the challenges of fine-grained spatio-temporal reasoning. Extensive experiments demonstrate the dataset’s complexity, highlight the limitations of existing models, and position TUMTraf VideoQA as a robust foundation for advancing research in intelligent transportation systems. The dataset and benchmark are publicly available to facilitate further exploration.

Details

EAAI Journal 2024 Journal Article

An adaptive hybrid surrogate model for FEA of telescopic boom of rock drilling jumbo

Yancheng Lv
Lin Lin
Hao Guo
Changsheng Tong
Yikun Liu
Sihao Zhang
Shiwei Suo

The rapid collaborative optimization (CO) has an increasing demand for high-fidelity surrogate models. However, the traditional surrogate model cannot be applied to all working conditions due to the limitations of model applicability. A hybrid surrogate model is proposed, which uses the attention mechanism to automatically decide the weights of the sub models according to the working conditions to ensure its approximation ability under all working conditions. First, according to the characteristics of the finite element analysis (FEA) parameters, a comprehensive design of experiment (DOE) is proposed, which ensures the space-filling property of the samples. Secondly, a Self-attention artificial neural network (ANN) is proposed to automatically adjust the weights of sub-surrogate models, improving the attention to the working condition-related features. The proposed Self-attention ANN is a general framework that can provide support for the adaptive weight decision in other equipment simulation hybrid surrogate models. The experiment on the database shows that the error of the two hybrid surrogate models established by the proposed method is 36. 04% and 33. 31% lower than that of the advanced model, respectively, and is significantly superior to other methods. This achievement not only combines the spatial approximation ability of sub models to establish nonlinear model, achieving the purpose of high-fidelity simulation of FEA systems, but also enables surrogate model covering complete space using limited samples, making the model suitable for various practical engineering problems. In summary, the proposed method has obvious advantages in solving existing problems, provides strong support for research and practical applications.

Details DOI

AAMAS Conference 2024 Conference Paper

Cooperation and Coordination in Heterogeneous Populations with Interaction Diversity

Hao Guo
Zhen Wang
Junliang Xing
Pin Tao
Yuanchun Shi

Cooperation, a prosocial behavior enhancing collective rewards in multi-agent games, intricately intertwines with coordination. This study explores how interaction diversity and zero-sum gifting influence cooperation and coordination in heterogeneous populations, where agents engage in threshold public goods games with multiple equilibria. Our model accommodates two sources of inequality: variations in agents’ capabilities to provide public goods and differences in the rewards they receive upon successful public good provision. In the absence of gifting, we demonstrate the inevitability of intermediate interaction intensity in fostering global cooperation, elucidating conditions for co-dominance, coexistence, and the polarized state of cooperation. While gifting introduces reciprocity opportunities, our findings highlight the importance of maintaining moderate levels of gifting, as excessive gifting can paradoxically undermine global cooperation. This research contributes valuable insights into the emergence of cooperation and coordination dynamics.

PDF

AAAI Conference 2022 Conference Paper

Fully Attentional Network for Semantic Segmentation

Qi Song
Jie Li
Chenghong Li
Hao Guo
Rui Huang

Recent non-local self-attention methods have proven to be effective in capturing long-range dependencies for semantic segmentation. These methods usually form a similarity map of RC×C (by compressing spatial dimensions) or RHW ×HW (by compressing channels) to describe the feature relations along either channel or spatial dimensions, where C is the number of channels, H and W are the spatial dimensions of the input feature map. However, such practices tend to condense feature dependencies along the other dimensions, hence causing attention missing, which might lead to inferior results for small/thin categories or inconsistent segmentation inside large objects. To address this problem, we propose a new approach, namely Fully Attentional Network (FLANet), to encode both spatial and channel attentions in a single similarity map while maintaining high computational efficiency. Specifically, for each channel map, our FLANet can harvest feature responses from all other channel maps, and the associated spatial positions as well, through a novel fully attentional module. Our new method has achieved state-of-the-art performance on three challenging semantic segmentation datasets, i. e. , 83. 6%, 46. 99%, and 88. 5% on the Cityscapes test set, the ADE20K validation set, and the PASCAL VOC test set, respectively.

PDF Details

AAAI Conference 2018 Conference Paper

Co-Saliency Detection Within a Single Image

Hongkai Yu
Kang Zheng
Jianwu Fang
Hao Guo
Wei Feng
Song Wang

Recently, saliency detection in a single image and co-saliency detection in multiple images have drawn extensive research interest in the vision community. In this paper, we investigate a new problem of co-saliency detection within a single image, i. e. , detecting within-image co-saliency. By identifying common saliency within an image, e. g. , highlighting multiple occurrences of an object class with similar appearance, this work can beneﬁt many important applications, such as the detection of objects of interest, more robust object recognition, reduction of information redundancy, and animation synthesis. We propose a new bottom-up method to address this problem. Speciﬁcally, a large number of object proposals are ﬁrst detected from the image. Then we develop an optimization algorithm to derive a set of proposal groups, each of which contains multiple proposals showing good common saliency in the original image. For each proposal group, we calculate a co-saliency map and then use a low-rank based algorithm to fuse the maps calculated from all the proposal groups for the ﬁnal co-saliency map in the image. In the experiment, we collect a new dataset of 364 color images with within-image cosaliency. Experiment results show that the proposed method can better detect the within-image co-saliency than existing algorithms.

PDF Details