Arrow Research search

Author name cluster

Kai Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers
2 author rows

Possible papers

37

AAAI Conference 2026 Conference Paper

Beyond Predictive Resampling: Learning Input-Agnostic Downsampling for Efficient Aligned Vision Recognition

  • Kai Zhao
  • Liting Ruan
  • Haoran Jiang
  • Xiaoqiang Zhu
  • Xianchao Zhang
  • Dan Zeng

Images are typically sampled on a uniform grid,despite their non-uniform information distribution—some regions are rich in content while others are not. The mismatch leads to inefficient computation allocation in deep learning models. To address this, recent studies have proposed predictive downsampling methodsthat adaptively downsample images based on predicted per-pixel importance, allocating more pixels to informative areas. However,these methods require high-resolution processing to accurately estimate importance, which undermines their efficiency:the prediction itself must process the full-resolution image,consuming most of the computational budget. This high-resolution importance prediction is necessary because each input may differ significantly in structure and content. In this paper, we take a different approach and introduce a learn-to-downsample paradigmtailored for aligned vision recognition tasks, such as face recognition and palmprint recognition, where input alignment ensures consistent spatial structure across images. This alignment ensures structural consistency across images, allowing a shared, input-agnostic downsampling template applicable to all inputs. Furthermore, instead of relying on implicit importance maps, we introduce a flow-based representation that explicitly models the spatial warping from the original image to the downsampled version. The flow representation is not only more efficient but also more controllable: we regularize the flow using its Jacobian determinant to precisely control the sampling density and coverage,enabling interpretable and tunable sampling patterns. Extensive experiments on two aligned recognition tasks, face and palmprint recognition, demonstrate that our method substantially reduces computational cost with minimal accuracy degradation, achieving a significantly better performance-efficiency trade-off than existing predictive downsampling methods.

AAAI Conference 2026 Conference Paper

CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards

  • Zhiming Lin
  • Kai Zhao
  • Sophie Zhang
  • Peilai Yu
  • Canran Xiao

Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zerosupervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10–13 F1 points and strong LLM fine-tunes by 5–8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence.CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.

EAAI Journal 2026 Journal Article

External grey information model and its performance

  • Kai Zhao
  • Yaoguo Dang
  • Shan Huang
  • Junjie Wang

The traditional grey prediction models are limited by insufficient ability to capture nonlinear trends, inefficient utilization of new information, and insufficient attenuation of old information noise in the grey system theory. This makes it difficult for the model to adapt to complex scene requirements. In response to the above limitations, this study proposes the high-order Logistic grey model with the external grey information. The core innovation of the model lies in: (1) The inherent internal grey information of the system and the external grey information supplemented by the environment are synergistically integrated. (2) The model adopts high-order Logistic accumulation operator, whose parameter range has been significantly expanded to [ − 1, 1 ]. (3) The model can dynamically suppress long-term noise interference by introducing a time decay factor. Then, the mathematical derivation of the proposed accumulation operator and model properties was carried out. The rationality of external grey information has been theoretically proven. Finally, in the testing of eight typical scenarios both domestically and internationally (China’s air quality, carbon emissions, electricity consumption, foundation settlement, renewable energy consumption, hydropower installed capacity, the United States publication output and Poland’s renewable energy consumption), the model demonstrated strong robustness (such as a prediction error as low as 0. 12% in renewable energy consumption forecasting). This gives an effective and useful tool for small-sample prediction in complex scenarios, and greatly facilitates the engineering applications of grey prediction theory.

EAAI Journal 2026 Journal Article

Flame detection technology for buildings safety based on wireless sensing

  • Yike Wang
  • Jiaqian Bao
  • Kai Zhao
  • Yong Xiong
  • Shiqing Zhang
  • Liangliang Lou

The sensing range of flame detection technology based on smoke, temperature, and camera can only cover the line-of-sight area, making it difficult for high-rise building residents to detect fire accidents in adjacent apartments before the fire spreads, thus wasting rescue time. This paper proposes a novel flame detection method based on Channel State Information (CSI) features extracted from Wi-Fi signals, leveraging the omnipresence of Wi-Fi networks in modern infrastructures. By utilizing machine learning to exploit the CSI features induced by environmental changes during fire events, the proposed method offers a robust and responsive engineering solution as an alternative to conventional detection methods. A machine learning-based classification algorithm is designed to differentiate between four critical scenarios: fire with human (FWH), only fire (FNH), only human (NFH), and no fire without human (NFN). The effectiveness of the method is validated through extensive experimentation under various fire conditions and diverse architectural apartment scenarios, utilizing a comprehensive dataset comprising 2100 samples. This dataset is specifically structured with 700 samples for NFN, 350 for FWH, 350 for FNH, and 700 for NFH. The method yields a classification accuracy of 96. 81 % and an end-to-end response time of 13 s, demonstrating improvements over existing approaches in terms of both response time and detection accuracy. Moreover, its reliance on pre-existing Wi-Fi infrastructure facilitates seamless integration into current smart building environments, thus offering a scalable and efficient solution for enhanced fire safety. The dataset used for this study is publicly available at: https: //github. com/T-bjq/WCFD-CSI-dataset.

AIIM Journal 2026 Journal Article

Multi-annotation agreement and prediction consistency networks: Improving semi-supervised segmentation of medical images with ambiguous boundaries

  • Shuai Wang
  • Tengjin Weng
  • Jingyi Wang
  • Kai Zhao
  • Yang Shen
  • Zhidong Zhao
  • Yixiu Liu
  • Pengfei Jiao

Medical image segmentation annotations exhibit variations among experts due to the ambiguous boundaries of segmented objects and backgrounds in medical images. Although using multiple annotations for each image in the fully-supervised setting has been extensively studied for training deep models, obtaining a large amount of multi-annotated data is challenging due to the substantial time and manpower costs required for segmentation annotations, resulting in most images lacking any annotations. To address this, we propose Multi-annotated Semi-supervised Ensemble Networks (MSE-Nets) for learning segmentation from limited multi-annotated and abundant unannotated data. Specifically, we introduce the Network Pairwise Consistency Enhancement (NPCE) module and Multi-Network Pseudo Supervised (MNPS) module to enhance MSE-Nets for the segmentation task by considering two major factors: (1) to optimize the utilization of all accessible multi-annotated data, the NPCE separates (dis)agreement annotations of multi-annotated data at the pixel level and handles agreement and disagreement annotations in different ways; (2) to mitigate the introduction of imprecise pseudo-labels, the MNPS extends the training data by leveraging consistent pseudo-labels from unannotated data. Finally, we improve confidence calibration by averaging the predictions of base networks. Experiments on the ISIC dataset show that we reduced the demand for multi-annotated data by 97. 75% and narrowed the gap with the best fully-supervised baseline to just a Jaccard index of 3. 7%. Furthermore, compared to other semi-supervised methods that rely only on a single annotation or a combined fusion approach, the comprehensive experimental results on ISIC and RIGA datasets demonstrate the superior performance of our proposed method in medical image segmentation with ambiguous boundaries.

AAAI Conference 2026 Conference Paper

PCFormer: Accelerating Privacy-preserving Transformer Inference by Partition and Combination

  • Bo Zeng
  • Zhi Pang
  • Yuyang Zhang
  • Kai Zhao
  • Tian Wu
  • Geying Yang
  • Lina Wang
  • Run Wang

In recent years, transformer-based models have achieved remarkable success in sensitive domains, including healthcare, finance and personalized services, but their deployment raises significant privacy concerns. Existing secure inference studies have introduced cryptographic techniques such as Homomorphic Encryption (HE) and Secure Multi-Party Computation (MPC). However, these approaches either target isolated model components or incur prohibitive computational and communication overheads, failing to support latency-sensitive or resource-limited environments. In our investigation, we identify substantial redundancy in the nonlinear operations and their alternation with linear layers in deep learning. Motivated by this observation, we propose PCFormer, a universal optimization methodology tailored for sequences of linear and nonlinear computations in the Transformer. PCFormer introduces structure-aware partition and combination techniques specially designed for Multi-Head Attention (MHA) and Feed-Forward Network (FFN). Specifically, we reveal the discrete sources of redundancy in the Softmax and GeLU functions during inference, implementing partitions at the token and channel levels, respectively. Subsequently, these reductions are then combined with the preceding and succeeding linear operations, thereby enhancing both computational and communication efficiency. Experimental results on GLUE benchmarks demonstrate that PCFormer achieves a 1.9× speedup in both computation and communication without compromising accuracy, compared to existing privacy-preserving Transformer frameworks. Furthermore, we demonstrate that PCFormer generalizes effectively to other deep learning architectures involving structured linear-nonlinear compositions under cryptographic constraints.

AAAI Conference 2026 Conference Paper

SABER: Switchable and Balanced Training for Efficient LLM Reasoning

  • Kai Zhao
  • Yanjun Zhao
  • Jiaming Song
  • Shien He
  • Lusheng Zhang
  • Qiang Zhang
  • Tianjiao Li

Large language models (LLMs) empowered by chain-of-thought reasoning have achieved impressive accuracy on complex tasks but suffer from excessive inference costs and latency when applied uniformly to all problems. We propose SABER (Switchable and Balanced Training for Efficient LLM Reasoning), a reinforcement learning framework that endows LLMs with user‑controllable, token‑budgeted reasoning. SABER first profiles each training example’s base‑model thinking token usage and assigns it to one of the predefined budget tiers. During fine‑tuning, the model is guided by system prompts and length‑aware rewards to respect its assigned budget. In parallel, we incorporate no‑think examples to ensure the model remains reliable even when explicit reasoning is turned off. SABER further supports four discrete inference modes—NoThink, FastThink, CoreThink, and DeepThink, enabling flexible trade‑offs between latency and reasoning depth. Extensive evaluations on math reasoning (MATH, GSM8K), code generation (MBPP), and logical reasoning (LiveBench-Reasoning) demonstrate that SABER achieves high accuracy under tight budgets, graceful degradation, and effective cross-scale and cross‑domain generalization. In particular, SABER‑FastThink cuts reasoning length by 65.4% and yields a 3.6% accuracy gain compared with the base model on the MATH benchmark.

AAAI Conference 2026 Conference Paper

Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors

  • Fan Luo
  • Zeyu Gao
  • Xinhao Luo
  • Kai Zhao
  • Yanfeng Lu

Spiking Neural Networks (SNNs), with their brain-inspired spatiotemporal dynamics and spike-driven computation, have emerged as promising energy-efficient alternatives to Artificial Neural Networks (ANNs). However, existing SNNs typically replicate inputs directly or aggregate them into frames at fixed intervals. Such strategies lead to neurons receiving nearly identical stimuli across time steps, severely limiting the model's expressive power—particularly in complex tasks like object detection. In this work, we propose the Temporal Dynamics Enhancer (TDE) to strengthen SNNs' capacity for temporal information modeling. TDE consists of two modules: a Spiking Encoder (SE) that generates diverse input stimuli across time steps, and an Attention Gating Module (AGM) that guides the SE generation based on inter-temporal dependencies. Moreover, to eliminate the high-energy multiplication operations introduced by the AGM, we propose a Spike-Driven Attention (SDA) to reduce attention-related energy consumption. Extensive experiments demonstrate that TDE can be seamlessly integrated into existing SNN-based detectors and consistently outperforms state-of-the-art methods, achieving mAP@50-95 scores of 57.7% on the static PASCAL VOC dataset and 47.6% on the neuromorphic EvDET200K dataset. In terms of energy consumption, the SDA consumes only 0.240× the energy of conventional attention modules.

AAAI Conference 2025 Conference Paper

Efficient Training of Neural Fractional-Order Differential Equation via Adjoint Backpropagation

  • Qiyu Kang
  • Xuhao Li
  • Kai Zhao
  • Wenjun Cui
  • Yanan Zhao
  • Weihua Deng
  • Wee Peng Tay

Fractional-order differential equations (FDEs) enhance traditional differential equations by extending the order of differential operators from integers to real numbers, offering greater flexibility in modeling complex dynamic systems with nonlocal characteristics. Recent progress at the intersection of FDEs and deep learning has catalyzed a new wave of innovative models, demonstrating the potential to address challenges such as graph representation learning. However, training neural FDEs has primarily relied on direct differentiation through forward-pass operations in FDE numerical solvers, leading to increased memory usage and computational complexity, particularly in large-scale applications. To address these challenges, we propose a scalable adjoint backpropagation method for training neural FDEs by solving an augmented FDE backward in time, which substantially reduces memory requirements. This approach provides a practical neural FDE toolbox and holds considerable promise for diverse applications. We demonstrate the effectiveness of our method in several tasks, achieving performance comparable to baseline models while significantly reducing computational overhead.

AAAI Conference 2025 Conference Paper

Neural Variable-Order Fractional Differential Equation Networks

  • Wenjun Cui
  • Qiyu Kang
  • Xuhao Li
  • Kai Zhao
  • Wee Peng Tay
  • Weihua Deng
  • Yidong Li

The use of neural differential equation models in machine learning applications has gained significant traction in recent years. In particular, fractional differential equations (FDEs) have emerged as a powerful tool for capturing complex dynamics in various domains. While existing models have primarily focused on constant-order fractional derivatives, variable-order fractional operators offer a more flexible and expressive framework for modeling complex memory patterns. In this work, we introduce the Neural Variable-Order Fractional Differential Equation network (NvoFDE), a novel neural network framework that integrates variable-order fractional derivatives with learnable neural networks. Our framework allows for the modeling of adaptive derivative orders dependent on hidden features, capturing more complex feature-updating dynamics and providing enhanced flexibility. We conduct extensive experiments across multiple graph datasets to validate the effectiveness of our approach. Our results demonstrate that NvoFDE outperforms traditional constant-order fractional and integer models across a range of tasks, showcasing its superior adaptability and performance.

EAAI Journal 2025 Journal Article

Partial convolution-simple attention mechanism-SegFormer: An accurate and robust model for landslide identification

  • Shuang Yang
  • Yuzhu Wang
  • Kai Zhao
  • Xiaocai Liu
  • Jingqin Mu
  • Xupeng Zhao

To achieve accurate and robust landslide identification, this study presents an advanced model based on the SegFormer architecture, named Partial Convolution-Simple Attention Mechanism-SegFormer (PConv-simAM-SegFormer). The model integrates partial convolution (PConv) to enhance spatial feature extraction capabilities and employs a simple attention mechanism (simAM), inspired by neuroscience, to optimize attention allocation. Additionally, a novel reverse transfer learning strategy is introduced to leverage features from older landslides, thereby improving the detection of new landslides. Experimental results on three large-scale publicly available datasets validate the modelś effectiveness: a mean Intersection over Union (mIoU) of 76. 05% on the Landslide Research for Sichuan–Tibet Transportation Corridor dataset (LRSTTC), representing a 3. 04% improvement over the baseline; an mIoU of 90. 35% on the Bijie Landslide dataset (Bijie), with a 2. 88% enhancement; and an mIoU of 97. 34% on the Tibetan Plateau Lakes dataset (TPL), with a 0. 81% enhancement. These results demonstrate the modelś high accuracy and robustness across diverse datasets and application scenarios, showcasing its significant potential for practical applications in the field of landslide detection.

AAMAS Conference 2024 Conference Paper

ANOTO: Improving Automated Negotiation via Offline-to-Online Reinforcement Learning

  • Siqi Chen
  • Jianing Zhao
  • Kai Zhao
  • Gerhard Weiss
  • Fengyun Zhang
  • Ran Su
  • Yang Dong
  • Daqian Li

Automated negotiation is a crucial component for establishing cooperation and collaboration within multi-agent systems. While reinforcement learning (RL)-based negotiating agents have achieved remarkable success in various scenarios, they still face limitations due to certain assumptions on which they are based. In this work, we proposes a novel approach called ANOTO to improve the negotiating agents’ ability via offline-to-online RL. ANOTO enables a negotiating agent (1) to communicate with opponents using an end-to-end strategy that covers all negotiation actions, (2) to learn negotiation strategies from historical offline data without requiring active interactions, and (3) to enhance the optimization process during the online phase, facilitating rapid and stable performance improvements for the learned offline strategies. Experimental results, based on a number of negotiation scenarios and recent winning agents from the Automated Negotiating Agents Competitions (ANAC), are provided.

AAAI Conference 2024 Conference Paper

Chronic Poisoning: Backdoor Attack against Split Learning

  • Fangchao Yu
  • Bo Zeng
  • Kai Zhao
  • Zhi Pang
  • Lina Wang

Split learning is a computing resource-friendly distributed learning framework that protects client training data by splitting the model between the client and server. Previous work has proved that split learning faces a severe risk of privacy leakage, as a malicious server can recover the client's private data by hijacking the training process. In this paper, we first explore the vulnerability of split learning to server-side backdoor attacks, where our goal is to compromise the model's integrity. Since the server-side attacker cannot access the training data and client model in split learning, the traditional poisoning-based backdoor attack methods are no longer applicable. Therefore, constructing backdoor attacks in split learning poses significant challenges. Our strategy involves the attacker establishing a shadow model on the server side that can encode backdoor samples and guiding the client model to learn from this model during the training process, thereby enabling the client to acquire the same capability. Based on these insights, we propose a three-stage backdoor attack framework named SFI. Our attack framework minimizes assumptions about the attacker's background knowledge and ensures that the attack process remains imperceptible to the client. We implement SFI on various benchmark datasets, and extensive experimental results demonstrate its effectiveness and generality. For example, success rates of our attack on MNIST, Fashion, and CIFAR10 datasets all exceed 90%, with limited impact on the main task.

AAAI Conference 2024 Conference Paper

Coupling Graph Neural Networks with Fractional Order Continuous Dynamics: A Robustness Study

  • Qiyu Kang
  • Kai Zhao
  • Yang Song
  • Yihang Xie
  • Yanan Zhao
  • Sijie Wang
  • Rui She
  • Wee Peng Tay

In this work, we rigorously investigate the robustness of graph neural fractional-order differential equation (FDE) models. This framework extends beyond traditional graph neural (integer-order) ordinary differential equation (ODE) models by implementing the time-fractional Caputo derivative. Utilizing fractional calculus allows our model to consider long-term memory during the feature updating process, diverging from the memoryless Markovian updates seen in traditional graph neural ODE models. The superiority of graph neural FDE models over graph neural ODE models has been established in environments free from attacks or perturbations. While traditional graph neural ODE models have been verified to possess a degree of stability and resilience in the presence of adversarial attacks in existing literature, the robustness of graph neural FDE models, especially under adversarial conditions, remains largely unexplored. This paper undertakes a detailed assessment of the robustness of graph neural FDE models. We establish a theoretical foundation outlining the robustness characteristics of graph neural FDE models, highlighting that they maintain more stringent output perturbation bounds in the face of input and graph topology disturbances, compared to their integer-order counterparts. Our empirical evaluations further confirm the enhanced robustness of graph neural FDE models, highlighting their potential in adversarially robust applications.

AAAI Conference 2024 Conference Paper

DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition

  • Sijie Wang
  • Rui She
  • Qiyu Kang
  • Xingchao Jian
  • Kai Zhao
  • Yang Song
  • Wee Peng Tay

The utilization of multi-modal sensor data in visual place recognition (VPR) has demonstrated enhanced performance compared to single-modal counterparts. Nonetheless, integrating additional sensors comes with elevated costs and may not be feasible for systems that demand lightweight operation, thereby impacting the practical deployment of VPR. To address this issue, we resort to knowledge distillation, which empowers single-modal students to learn from cross-modal teachers without introducing additional sensors during inference. Despite the notable advancements achieved by current distillation approaches, the exploration of feature relationships remains an under-explored area. In order to tackle the challenge of cross-modal distillation in VPR, we present DistilVPR, a novel distillation pipeline for VPR. We propose leveraging feature relationships from multiple agents, including self-agents and cross-agents for teacher and student neural networks. Furthermore, we integrate various manifolds, characterized by different space curvatures for exploring feature relationships. This approach enhances the diversity of feature relationships, including Euclidean, spherical, and hyperbolic relationship modules, thereby enhancing the overall representational capacity. The experiments demonstrate that our proposed pipeline achieves state-of-the-art performance compared to other distillation baselines. We also conduct necessary ablation studies to show design effectiveness. The code is released at: https://github.com/sijieaaa/DistilVPR

NeurIPS Conference 2024 Conference Paper

Distributed-Order Fractional Graph Operating Network

  • Kai Zhao
  • Xuhao Li
  • Qiyu Kang
  • Feng Ji
  • Qinxu Ding
  • Yanan Zhao
  • Wenfei Liang
  • Wee Peng Tay

We introduce the Distributed-order fRActional Graph Operating Network (DRAGON), a novel continuous Graph Neural Network (GNN) framework that incorporates distributed-order fractional calculus. Unlike traditional continuous GNNs that utilize integer-order or single fractional-order differential equations, DRAGON uses a learnable probability distribution over a range of real numbers for the derivative orders. By allowing a flexible and learnable superposition of multiple derivative orders, our framework captures complex graph feature updating dynamics beyond the reach of conventional models. We provide a comprehensive interpretation of our framework's capability to capture intricate dynamics through the lens of a non-Markovian graph random walk with node feature updating driven by an anomalous diffusion process over the graph. Furthermore, to highlight the versatility of the DRAGON framework, we conduct empirical evaluations across a range of graph learning tasks. The results consistently demonstrate superior performance when compared to traditional continuous GNN models. The implementation code is available at \url{https: //github. com/zknus/NeurIPS-2024-DRAGON}.

AAMAS Conference 2024 Conference Paper

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

  • Kai Zhao
  • Jianye Hao
  • Yi Ma
  • Jinyi Liu
  • Yan Zheng
  • Zhaopeng Meng

Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called ENsemble-based Offline-To- Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion tasks, significantly outperforming existing offline-to-online RL methods.

IJCAI Conference 2024 Conference Paper

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

  • Kai Zhao
  • Jianye Hao
  • Yi Ma
  • Jinyi Liu
  • Yan Zheng
  • Zhaopeng Meng

Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion and navigation tasks, significantly outperforming existing offline-to-online RL methods.

IJCAI Conference 2024 Conference Paper

Exploring Urban Semantics: A Multimodal Model for POI Semantic Annotation with Street View Images and Place Names

  • Dabin Zhang
  • Meng Chen
  • Weiming Huang
  • Yongshun Gong
  • Kai Zhao

Semantic annotation for points of interest (POIs) is the process of annotating a POI with a category label, which facilitates many services related to POIs, such as POI search and recommendation. Most of the existing solutions extract features related to POIs from abundant user-generated content data (e. g. , check-ins and user comments). However, such data are often difficult to obtain, especially for newly created POIs. In this paper, we aim to explore semantic annotation for POIs with limited information such as POI (place) names and geographic locations. Additionally, we have found that the street view images provide extensive visual clues about POI attributes and could be an essential supplement to limited information of POIs that enables semantic annotation. To this end, we propose a novel multimodal model for POI semantic annotation, namely M3PA, which achieves enhanced semantic annotation through fusing a POI’s textual and visual representations. Specifically, M3PA extracts visual features from street view images using a pre-trained image encoder and integrates these features to generate the visual representation of a targeted POI based on a geographic attention mechanism. Furthermore, M3PA utilizes the contextual information of neighboring POIs to extract textual features and captures their spatial relationships through geographical encoding to generate the textual representation of a targeted POI. Finally, the visual and textual representations of a POI are fused for semantic annotation. Extensive experiments with POI data from Amap validate the effectiveness of M3PA for POI semantic annotation, compared with several competitive baselines.

IJCAI Conference 2024 Conference Paper

Learning Hierarchy-Enhanced POI Category Representations Using Disentangled Mobility Sequences

  • Hongwei Jia
  • Meng Chen
  • Weiming Huang
  • Kai Zhao
  • Yongshun Gong

Points of interest (POIs) carry a wealth of semantic information of varying locations in cities and thus have been widely used to enable various location-based services. To understand POI semantics, existing methods usually model contextual correlations of POI categories in users' check-in sequences and embed categories into a latent space based on the word2vec framework. However, such an approach does not fully capture the underlying hierarchical relationship between POI categories and can hardly integrate the category hierarchy into various deep sequential models. To overcome this shortcoming, we propose a Semantically Disentangled POI Category Embedding Model (SD-CEM) to generate hierarchy-enhanced category representations using disentangled mobility sequences. Specifically, first, we construct disentangled mobility sequences using human mobility data based on the semantics of POIs. Then we utilize the POI category hierarchy to initialize a hierarchy-enhanced representation for each category in the disentangled sequences, employing an attention mechanism. Finally, we optimize these category representations by incorporating both the masked category prediction task and the next category prediction task. To evaluate the effectiveness of SD-CEM, we conduct comprehensive experiments using two check-in datasets covering three tasks. Experimental results demonstrate that SD-CEM outperforms several competitive baselines, highlighting its substantial improvement in performance as well as the understanding of learned category representations.

AAAI Conference 2024 Conference Paper

Learning Visual Abstract Reasoning through Dual-Stream Networks

  • Kai Zhao
  • Chang Xu
  • Bailu Si

Visual abstract reasoning tasks present challenges for deep neural networks, exposing limitations in their capabilities. In this work, we present a neural network model that addresses the challenges posed by Raven’s Progressive Matrices (RPM). Inspired by the two-stream hypothesis of visual processing, we introduce the Dual-stream Reasoning Network (DRNet), which utilizes two parallel branches to capture image features. On top of the two streams, a reasoning module first learns to merge the high-level features of the same image. Then, it employs a rule extractor to handle combinations involving the eight context images and each candidate image, extracting discrete abstract rules and utilizing an multilayer perceptron (MLP) to make predictions. Empirical results demonstrate that the proposed DRNet achieves state-of-the-art average performance across multiple RPM benchmarks. Furthermore, DRNet demonstrates robust generalization capabilities, even extending to various out-of-distribution scenarios. The dual streams within DRNet serve distinct functions by addressing local or spatial information. They are then integrated into the reasoning module, leveraging abstract rules to facilitate the execution of visual reasoning tasks. These findings indicate that the dual-stream architecture could play a crucial role in visual abstract reasoning.

AAAI Conference 2024 Conference Paper

PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations

  • Rui She
  • Sijie Wang
  • Qiyu Kang
  • Kai Zhao
  • Yang Song
  • Wee Peng Tay
  • Tianyu Geng
  • Xingchao Jian

Point cloud registration is a crucial technique in 3D computer vision with a wide range of applications. However, this task can be challenging, particularly in large fields of view with dynamic objects, environmental noise, or other perturbations. To address this challenge, we propose a model called PosDiffNet. Our approach performs hierarchical registration based on window-level, patch-level, and point-level correspondence. We leverage a graph neural partial differential equation (PDE) based on Beltrami flow to obtain high-dimensional features and position embeddings for point clouds. We incorporate position embeddings into a Transformer module based on a neural ordinary differential equation (ODE) to efficiently represent patches within points. We employ the multi-level correspondence derived from the high feature similarity scores to facilitate alignment between point clouds. Subsequently, we use registration methods such as SVD-based algorithms to predict the transformation using corresponding point pairs. We evaluate PosDiffNet on several 3D point cloud datasets, verifying that it achieves state-of-the-art (SOTA) performance for point cloud registration in large fields of view with perturbations. The implementation code of experiments is available at https://github.com/AI-IT-AVs/PosDiffNet.

IJCAI Conference 2024 Conference Paper

Self-Promoted Clustering-based Contrastive Learning for Brain Networks Pretraining

  • Junbo Ma
  • Caixuan Luo
  • Jia Hou
  • Kai Zhao

Rapid advancements in neuroimaging techniques, such as magnetic resonance imaging (MRI), have facilitated the acquisition of the structural and functional characteristics of the brain. Brain network analysis is one of the essential tools for exploring brain mechanisms from MRI, providing valuable insights into the brain's organization, and stimulating the understanding of brain cognition and pathology of neurodegenerative diseases. Graph Neural Networks (GNNs) are commonly used for brain network analysis, but they are limited by the scarcity of medical data. Although Graph Contrastive Learning methods have been developed to address this, they often involve graph augmentations that distort the anatomical brain structures. To address these challenges, an augmentation-free contrastive learning method, named Self-Promoted Clustering-based Contrastive Learning(SPCCL), is proposed in this paper. Specifically, by introducing a clustering-based contrastive Learning loss and a self-promoted contrastive pairs creation scheme, the proposed SPCCL can be pre-trained from additional healthy subjects' data that are relatively easier to acquire than disorder ones. The proposed SPCCL leverages these additional data with respect to the integrity of the original brain structure, making it a promising approach for effective brain network analysis. Comprehensive experiments are conducted on an open-access schizophrenic dataset, demonstrating the effectiveness of the proposed method.

ICLR Conference 2024 Conference Paper

Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback

  • Yifu Yuan
  • Jianye Hao
  • Yi Ma 0005
  • Zibin Dong
  • Hebin Liang
  • Jinyi Liu 0002
  • Zhixin Feng
  • Kai Zhao

Reinforcement Learning with Human Feedback (RLHF) has received significant attention for performing tasks without the need for costly manual reward design by aligning human preferences. It is crucial to consider diverse human feedback types and various learning methods in different environments. However, quantifying progress in RLHF with diverse feedback is challenging due to the lack of standardized annotation platforms and widely used unified benchmarks. To bridge this gap, we introduce **Uni-RLHF**, a comprehensive system implementation tailored for RLHF. It aims to provide a complete workflow from *real human feedback*, fostering progress in the development of practical problems. Uni-RLHF contains three packages: 1) a universal multi-feedback annotation platform, 2) large-scale crowdsourced feedback datasets, and 3) modular offline RLHF baseline implementations. Uni-RLHF develops a user-friendly annotation interface tailored to various feedback types, compatible with a wide range of mainstream RL environments. We then establish a systematic pipeline of crowdsourced annotations, resulting in large-scale annotated datasets comprising more than 15 million steps across 30 popular tasks. Through extensive experiments, the results in the collected datasets demonstrate competitive performance compared to those from well-designed manual rewards. We evaluate various design choices and offer insights into their strengths and potential areas of improvement. We wish to build valuable open-source platforms, datasets, and baselines to facilitate the development of more robust and reliable RLHF solutions based on realistic human feedback. The website is available at https://uni-rlhf.github.io/.

AAAI Conference 2024 Conference Paper

Urban Region Embedding via Multi-View Contrastive Prediction

  • Zechen Li
  • Weiming Huang
  • Kai Zhao
  • Min Yang
  • Yongshun Gong
  • Meng Chen

Recently, learning urban region representations utilizing multi-modal data (information views) has become increasingly popular, for deep understanding of the distributions of various socioeconomic features in cities. However, previous methods usually blend multi-view information in a posteriors stage, falling short in learning coherent and consistent representations across different views. In this paper, we form a new pipeline to learn consistent representations across varying views, and propose the multi-view Contrastive Prediction model for urban Region embedding (ReCP), which leverages the multiple information views from point-of-interest (POI) and human mobility data. Specifically, ReCP comprises two major modules, namely an intra-view learning module utilizing contrastive learning and feature reconstruction to capture the unique information from each single view, and inter-view learning module that perceives the consistency between the two views using a contrastive prediction learning scheme. We conduct thorough experiments on two downstream tasks to assess the proposed model, i.e., land use clustering and region popularity prediction. The experimental results demonstrate that our model outperforms state-of-the-art baseline methods significantly in urban region representation learning.

NeurIPS Conference 2023 Conference Paper

Adversarial Robustness in Graph Neural Networks: A Hamiltonian Approach

  • Kai Zhao
  • Qiyu Kang
  • Yang Song
  • Rui She
  • Sijie Wang
  • Wee Peng Tay

Graph neural networks (GNNs) are vulnerable to adversarial perturbations, including those that affect both node features and graph topology. This paper investigates GNNs derived from diverse neural flows, concentrating on their connection to various stability notions such as BIBO stability, Lyapunov stability, structural stability, and conservative stability. We argue that Lyapunov stability, despite its common use, does not necessarily ensure adversarial robustness. Inspired by physics principles, we advocate for the use of conservative Hamiltonian neural flows to construct GNNs that are robust to adversarial attacks. The adversarial robustness of different neural flow GNNs is empirically compared on several benchmark datasets under a variety of adversarial attacks. Extensive numerical experiments demonstrate that GNNs leveraging conservative Hamiltonian flows with Lyapunov stability substantially improve robustness against adversarial perturbations. The implementation code of experiments is available at \url{https: //github. com/zknus/NeurIPS-2023-HANG-Robustness}.

IJCAI Conference 2023 Conference Paper

Graph Neural Convection-Diffusion with Heterophily

  • Kai Zhao
  • Qiyu Kang
  • Yang Song
  • Rui She
  • Sijie Wang
  • Wee Peng Tay

Graph neural networks (GNNs) have shown promising results across various graph learning tasks, but they often assume homophily, which can result in poor performance on heterophilic graphs. The connected nodes are likely to be from different classes or have dissimilar features on heterophilic graphs. In this paper, we propose a novel GNN that incorporates the principle of heterophily by modeling the flow of information on nodes using the convection-diffusion equation (CDE). This allows the CDE to take into account both the diffusion of information due to homophily and the ``convection'' of information due to heterophily. We conduct extensive experiments, which suggest that our framework can achieve competitive performance on node classification tasks for heterophilic graphs, compared to the state-of-the-art methods. The code is available at https: //github. com/zknus/Graph-Diffusion-CDE.

EAAI Journal 2023 Journal Article

Robust and fuzzy ensemble framework via spectral learning for random projection-based fuzzy-c-means clustering

  • Zhaoyin Shi
  • Long Chen
  • Junwei Duan
  • Guangyong Chen
  • Kai Zhao

The ensembles of random projection-based fuzzy-c-means (RP-FCM) can handle high-dimensional data efficiently. However, the performance of these ensemble frameworks is still hindered by some issues, such as misaligned membership matrices, information loss of co-similar matrices, large storage space, unstable ensemble results due to the additional re-clustering, etc. To address these issues, we propose a robust and fuzzy ensemble framework via spectral learning for RP-FCM clustering. After using random projection to generate different dimensional datasets and obtaining the membership matrices via fuzzy-c-means, we first convert these membership matrices into regularized graphs and approximates the affinity matrices of these graphs by spectral matrices. This step not only avoids the alignment problems of membership matrices but also excludes the storage of large-scale graphs. The spectral matrices of the same size are used as the features of membership matrices for the ensemble, avoiding the possible information loss by applying co-similar matrix transformations. More importantly, an optimization model is designed in our framework to learn the fusion of spectral features. In this model, the proportion of each base clustering is adjusted adaptively through a fuzzification exponent, and the effect of outliers is also suppressed by a robust norm. Finally, the Laplacian rank constraint in the model guarantees the ensemble can achieve the exact final partition. An efficient algorithm for this model is derived, and its time complexity and convergence are also analyzed. Competitive experimental results on benchmark data demonstrate the effectiveness of the proposed ensemble framework in comparison to state-of-the-art methods.

IJCAI Conference 2023 Conference Paper

Towards an Integrated View of Semantic Annotation for POIs with Spatial and Textual Information

  • Dabin Zhang
  • Ronghui Xu
  • Weiming Huang
  • Kai Zhao
  • Meng Chen

Categories of Point of Interest (POI) facilitate location-based services from many aspects like location search and POI recommendation. However, POI categories are often incomplete and new POIs are being consistently generated, this rises the demand for semantic annotation for POIs, i. e. , labeling the POI with a semantic category. Previous methods usually model sequential check-in information of users to learn POI features for annotation. However, users' check-ins are hardly obtained in reality, especially for those newly created POIs. In this context, we present a Spatial-Textual POI Annotation (STPA) model for static POIs, which derives POI categories using only the geographic locations and names of POIs. Specifically, we design a GCN-based spatial encoder to model spatial correlations among POIs to generate POI spatial embeddings, and an attention-based text encoder to model the semantic contexts of POIs to generate POI textual embeddings. We finally fuse the two embeddings and preserve multi-view correlations for semantic annotation. We conduct comprehensive experiments to validate the effectiveness of STPA with POI data from AMap. Experimental results demonstrate that STPA substantially outperforms several competitive baselines, which proves that STPA is a promising approach for annotating static POIs in map services.

IJCAI Conference 2022 Conference Paper

A Closed-Loop Perception, Decision-Making and Reasoning Mechanism for Human-Like Navigation

  • Wenqi Zhang
  • Kai Zhao
  • Peng Li
  • Xiao Zhu
  • Yongliang Shen
  • Yanna Ma
  • Yingfeng Chen
  • Weiming Lu

Reliable navigation systems have a wide range of applications in robotics and autonomous driving. Current approaches employ an open-loop process that converts sensor inputs directly into actions. However, these open-loop schemes are challenging to handle complex and dynamic real-world scenarios due to their poor generalization. Imitating human navigation, we add a reasoning process to convert actions back to internal latent states, forming a two-stage closed loop of perception, decision-making, and reasoning. Firstly, VAE-Enhanced Demonstration Learning endows the model with the understanding of basic navigation rules. Then, two dual processes in RL-Enhanced Interaction Learning generate reward feedback for each other and collectively enhance obstacle avoidance capability. The reasoning model can substantially promote generalization and robustness, and facilitate the deployment of the algorithm to real-world robots without elaborate transfers. Experiments show our method is more adaptable to novel scenarios compared with state-of-the-art approaches.

NeurIPS Conference 2022 Conference Paper

On the Robustness of Graph Neural Diffusion to Topology Perturbations

  • Yang Song
  • Qiyu Kang
  • Sijie Wang
  • Kai Zhao
  • Wee Peng Tay

Neural diffusion on graphs is a novel class of graph neural networks that has attracted increasing attention recently. The capability of graph neural partial differential equations (PDEs) in addressing common hurdles of graph neural networks (GNNs), such as the problems of over-smoothing and bottlenecks, has been investigated but not their robustness to adversarial attacks. In this work, we explore the robustness properties of graph neural PDEs. We empirically demonstrate that graph neural PDEs are intrinsically more robust against topology perturbation as compared to other GNNs. We provide insights into this phenomenon by exploiting the stability of the heat semigroup under graph topology perturbations. We discuss various graph diffusion operators and relate them to existing graph neural PDEs. Furthermore, we propose a general graph neural PDE framework based on which a new class of robust GNNs can be defined. We verify that the new model achieves comparable state-of-the-art performance on several benchmark datasets.

IROS Conference 2021 Conference Paper

Learning to Navigate in a VUCA Environment: Hierarchical Multi-expert Approach

  • Wenqi Zhang 0001
  • Kai Zhao
  • Peng Li 0031
  • Xiao Zhu
  • Faping Ye
  • Weijie Jiang 0003
  • Huiqiao Fu
  • Tao Wang 0004

Despite decades of efforts, robot navigation in a real scenario with volatility, uncertainty, complexity, and ambiguity (VUCA for short), remains a challenging topic. Inspired by the central nervous system (CNS), we propose a hierarchical multi-expert learning framework for autonomous navigation in a VUCA environment. With a heuristic exploration mechanism considering target location, path cost, and safety level, the upper layer performs simultaneous map exploration and route-planning to avoid trapping in a blind alley, similar to the cerebrum in the CNS. Using a local adaptive model fusing multiple discrepant strategies, the lower layer pursuits a balance between collision-avoidance and go-straight strategies, acting as the cerebellum in the CNS. We conduct simulation and real-world experiments on multiple platforms, including legged and wheeled robots. Experimental results demonstrate our algorithm outperforms the existing methods in terms of task achievement, time efficiency, and security. A video of our results is available at https://youtu.be/lAnW4QIWDoU.

JBHI Journal 2019 Journal Article

Drug Repositioning for Schizophrenia and Depression/Anxiety Disorders: A Machine Learning Approach Leveraging Expression Data

  • Kai Zhao
  • Hon-Cheong So

Development of new medications is a lengthy and costly process, and drug repositioning might help to shorten the development cycle. We present a machine learning (ML) workflow to drug discovery or repositioning by predicting indication for a particular disease based on drug expression profiles, with a focus on applications in psychiatry. Drugs that are not originally indicated for the disease but with high predicted probabilities serve as candidates for repurposing. This approach is widely applicable to any chemicals or drugs with expression profiles measured, even if drug targets are unknown. It is also highly flexible as virtually any supervised learning algorithms can be used. We employed the ML approach to identify repositioning opportunities for schizophrenia as well as depression and anxiety disorders. We applied various state-of-the-art ML approaches, including deep neural networks (DNNs), support vector machines (SVMs), elastic net regression, random forest, and gradient boosted trees. The predictive performance of the five approaches in cross validation did not differ substantially, with SVM slightly outperforming the others. However, other methods also reveal literature-supported repositioning candidates of different mechanisms of actions. As a further validation, we showed that the repositioning hits are enriched for psychiatric medications considered in clinical trials. We also examined the correlation between predicted probabilities of treatment potential and the number of related research articles, and found significant correlations for all methods, especially DNN. Finally, we propose that ML may provide a new avenue to exploring drug mechanisms via examining the variable importance of gene features.

IJCAI Conference 2018 Conference Paper

Hi-Fi: Hierarchical Feature Integration for Skeleton Detection

  • Kai Zhao
  • Wei Shen
  • Shanghua Gao
  • Dandan Li
  • Ming-Ming Cheng

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.

NeurIPS Conference 2017 Conference Paper

Label Distribution Learning Forests

  • Wei Shen
  • Kai Zhao
  • Yilu Guo
  • Alan Yuille

Label distribution learning (LDL) is a general learning framework, which assigns to an instance a distribution over a set of labels rather than a single label or multiple labels. Current LDL methods have either restricted assumptions on the expression form of the label distribution or limitations in representation learning, e. g. , to learn deep features in an end-to-end manner. This paper presents label distribution learning forests (LDLFs) - a novel label distribution learning algorithm based on differentiable decision trees, which have several advantages: 1) Decision trees have the potential to model any general form of label distributions by a mixture of leaf node predictions. 2) The learning of differentiable decision trees can be combined with representation learning. We define a distribution-based loss function for a forest, enabling all the trees to be learned jointly, and show that an update function for leaf node predictions, which guarantees a strict decrease of the loss function, can be derived by variational bounding. The effectiveness of the proposed LDLFs is verified on several LDL tasks and a computer vision application, showing significant improvements to the state-of-the-art LDL methods.

JBHI Journal 2016 Journal Article

Analysis of the Chaotic Characteristics of Human Colonic Activities and Comparison of Healthy Participants to Costive Subjects

  • Li Lu
  • Guozheng Yan
  • Kai Zhao
  • Fei Xu

Constipation is a common yet distressing disease that has high rates of morbidity and impacts patients' quality of life. However, there is no perfect method to distinguish costive patients from healthy subjects. Is there chaos in human colonic activities? Are there any differences for the chaos indicators of colonic activities between healthy and costive subjects? Can these indicators distinguish patients with constipation from healthy subjects? To answer these questions, colonic pressure data from 16 healthy subjects and 48 patients with constipation were analyzed using the chaos theory. Three chaotic indicators [i. e. , the largest Lyapunov exponent (LyE), correlation dimension (CorDim), and Kolmogorov entropy (KoEn)] were calculated and compared between groups with the Wilcoxon rank sum test. As a result, the LyE was greater than zero and the CorDim was fractioned, which showed that human colonic activities have clear chaotic characteristics. Statistically significant differences were observed between groups for CorDim (p <; 0. 05), whereas LyE did not show statistically significant differences between groups. The chaotic indicator of CorDim was able to differentiate between patients with constipation and healthy subjects. The chaos theory provides a new method for learning the nonlinear dynamics of human gastrointestinal activities.

IJCAI Conference 2011 Conference Paper

Incorporating Reviewer and Product Information for Review Rating Prediction

  • Fangtao Li
  • Nathan Liu
  • Hongwei Jin
  • Kai Zhao
  • Qiang Yang
  • Xiaoyan Zhu

Among sentiment analysis tasks, review rating prediction is more helpful than binary (positive and negative) classification, especially when the consumers want to compare two good products. Previous work has addressed this problem by extracting various features from the review text for learning a predictor. Since the same word may have different sentiment effects when used by different reviewers on different products, we argue that it is necessary to model such reviewer and product dependent effects in order to predict review ratings more accurately. In this paper, we propose a novel learning framework to incorporate reviewer and product information into the text based learner for rating prediction. The reviewer, product and text feature are modeled as a three-dimension tensor. The tensor factorization technique is employed to reduce the sparsity and complexity problems. The experiment results demonstrate the effectiveness of our model. We achieve significant improvement as compared with the state of the art methods, especially for the reviews with unpopular products and inactive reviewers.