Author name cluster

Zhonglong Zheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

AAAI Conference 2026 Conference Paper

AdaDepth: Exploiting Inherent Scene Information for Self-Supervised Depth Estimation in Dynamic Scenes

Xuanang Gao
Xiongbin Wu
Zhiwei Ning
Runze Yang
Zhonglong Zheng
Jie Yang
Wei Liu

Self-supervised monocular depth estimation methods severely compromise accuracy in dynamic objects due to their static scene assumption. Existing approaches for dynamic scenes suffer from two critical shortcomings: 1) reliance on supervised segmentation models (requiring costly annotations) or computationally intensive multi-branch models to isolate moving objects, and 2) simple integration of 2D/3D motion flow without reliable supervision for dynamic objects. We propose AdaDepth, a two‑stage framework that jointly performs unsupervised scene decomposition and dynamic-aware depth learning. In the initial structural stage, our geometry-motion joint scene decomposition (GMoDecomp) module ensures the robust generation of a depth prior and simultaneously partitions the scene into multiple regions through the fusion of geometric and motion cues. In the region-adaptive refinement stage, we exploit the depth prior and decomposed regions to introduce motion-aware and geometry-consistent constraints, effectively improving depth estimation in dynamic scenes. AdaDepth achieves accurate depth prediction in highly dynamic scenes without relying on external labels or specialized segmentation models. Extensive experiments on KITTI, Cityscapes, and Waymo Open demonstrate its superiority over state-of-the-art approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Exploiting All Mamba Fusion for Efficient RGB-D Tracking

Ge Ying
Dawei Zhang
Chengzhuan Yang
Wei Liu
Sang-Woon Jeon
Hua Wang
Changqin Huang
Zhonglong Zheng

Despite the progress made through deep learning, existing Visual Object Tracking (VOT) frameworks struggle with real-world challenges. Recent approaches incorporate additional modalities like Depth, Thermal Infrared, and Language to enhance the robustness of VOT, particularly with the improvement of the depth sensor precision, facilitating RGB-D tracking. However, current RGB-D trackers often copy RGB tracking paradigms, leading to inefficiency due to two-stream architectures that fail to exploit heterogeneous features, and reliance on simplistic or large-parameter fusion methods. To address these challenges, we propose AMTrack, a one-stream RGB-D tracker leveraging Mamba's linear complexity for simultaneous feature extraction and two-stage cross-modal feature fusion. Our innovation also includes a low-parameter Multimodal Mix Mamba (3M) module, which optimizes deep feature fusion and reduces computational overhead. The advantage of the 3M module stems from our Multimodal State Space Model (MSSM), a multimodal feature interaction component reconstructed based on SSM. Experiments across multiple RGB-D tracking datasets indicate that AMTrack achieves superior performance with lower parameters and memory demands compared to state-of-the-arts.

PDF Details DOI

AAAI Conference 2026 Conference Paper

HyperGOOD: Towards Out-of-Distribution Detection in Hypergraphs

Tingyi Cai
Yunliang Jiang
Ming Li
Changqin Huang
Yujie Fang
Chengling Gao
Zhonglong Zheng

Out-of-distribution (OOD) detection plays a critical role in ensuring the robustness of machine learning models in open-world settings. While extensive efforts have been made in vision, language, and graph domains, the challenge of OOD detection in hypergraph-structured data remains unexplored. In this work, we formalize the problem of hypergraph out-of-distribution (HOOD) detection, which aims to identify nodes or hyperedges whose high-order relational contexts differ significantly from those seen during training. We propose HyperGOOD, a unified energy-based detection framework that integrates multi-scale spectral decomposition with structure-aware uncertainty propagation. By preserving both low- and high-frequency signals and diffusing uncertainty across the hypergraph, HyperGOOD effectively captures subtle and relationally entangled anomalies. Experimental results on nine hypergraph datasets demonstrate the effectiveness of our approach, establishing a new foundation for robust hypergraph learning under distributional shifts.

PDF Details DOI

AAAI Conference 2026 Conference Paper

IGIANet: Illumination Guided Implicit Alignment Network for Infrared–Visible UAV Detection

Xiangqi Chen
Dawei Zhang
Li Zhao
Chengzhuan Yang
Zhongyu Chen
Jungang Lou
Zhonglong Zheng
Sang-Woon Jeon

Visible-Infrared (RGB-IR) Unmanned Aerial Vehicle (UAV) object detection integrates complementary cues from visible and infrared sensors, offering broad application potential. However, due to sensor parallax, it still faces the challenge of weak spatial misalignment, which significantly limits its performance in UAV-based object detection. Existing methods emphasize strict alignment, overlooking spectral heterogeneity under varying illumination. To address these issues, we propose the Illumination Guided Implicit Alignment Network (IGIANet) to mitigate modality heterogeneity without explicit alignment. Specifically, we integrate three novel modules. First, we propose an illumination-guided frequency modulation module that adaptively allocates fusion weights to visible and infrared features based on global illumination estimation, effectively alleviating modality imbalance under varying lighting conditions. Second, we introduce a frequency-guided cross-modality differential enhancement module, which computes differential cues across frequency domains to enhance complementary information and highlight weakly aligned and low-contrast regions. Finally, we introduce an implicit alignment-driven dynamic fusion module that actively estimates offsets and generates dynamic, position-adaptive fusion kernels to align and fuse modalities. Extensive experiments demonstrate that IGIANet outperforms state-of-the-art models on various benchmarks, achieving 80.9% mAP on DroneVehicle, 57.1% mAP on VEDAI, and 49.4% mAP on FLIR.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Neural Outline Cache for Real-time Anti-aliasing Font Rendering

Jiashuaizi Mo
Sang-Woon Jeon
Hua Wang
Xiangqi Chen
Yanchao Wang
Minglu Li
Zhonglong Zheng

Neural textures have emerged as pivotal assets in next-generation neural rendering pipelines. However, hardware limitations and programming interface constraints lead to suboptimal performance in multi-instance real-time rendering scenarios. This bottleneck becomes particularly acute for texture-intensive tasks such as font rendering. To address this, we propose Neural Outline Cache (NOC), a novel neural font texture supporting real-time anti-aliased rendering and procedural editing within modern neural graphics pipelines. NOC's lightweight network leverages multi-resolution hash encoding to cache spline-derived SDFs, delivering anti-aliased rendering via standard graphics pipelines. For massive-instance scalability, our cache buffer layout (CBL) and batch-fused inference (BFI), tailored for NOC, mitigate neural texture streaming bottlenecks. We constructed an evaluation dataset using five font styles. In offline rendering, our proposed method achieves overall average results of 57.35 dB PSNR, 0.998 SSIM, and 1.1584e-3 pixel RMSE, while maintaining approximately 0.5ms frame latency with 500 real-time instances. To demonstrate its versatility, we integrated a procedural editor for visual effects editing of NOC textures. These results all prove that NOC is a reliable, production-ready neural asset.

PDF Details DOI

TIST Journal 2026 Journal Article

Towards Evolutionary Differential Privacy in Cross-Platform Spatial Crowdsourcing

Yong-Feng Ge
Hua Wang
Elisa Bertino
Jinli Cao
Yanchun Zhang
Zhonglong Zheng

The development of mobile web services has brought significant attention to spatial crowdsourcing. The uneven distribution of tasks and workers has led to recent research on Cross-Platform Spatial Crowdsourcing (CPSC), aiming for a multi-win situation for platforms, workers and task requesters. Previous studies on CPSC problems focused on task assignment and worker selection performance, overlooking the importance of privacy preservation. This paper addresses the existing challenges of privacy preservation and service quality by formulating a Privacy-Preserving Cross-Platform Spatial Crowdsourcing (PP-CPSC) problem and proves it to be NP-hard. We propose an Evolutionary Differential Privacy (Evo-DP) approach to optimize PP-CPSC. Evo-DP's evolutionary framework enables efficient and flexible optimization of privacy budget allocation. Within Evo-DP, each solution to the privacy budget allocation is represented as an individual in the population. To approximate the optimal solution, three evolutionary operations - mutation, crossover, and scaling - are employed for population updates, along with a selection process. A hybrid population model is introduced to balance exploration and exploitation abilities. Experimental results demonstrate Evo-DP's superiority over previous strategies in terms of solution quality, convergence speed, and scalability.

IJCAI Conference 2025 Conference Paper

All Roads Lead to Rome: Exploring Edge Distribution Shifts for Heterophilic Graph Learning

Yi Wang
Changqin Huang
Ming Li
Tingyi Cai
Zhonglong Zheng
Xiaodi Huang

Heterophilic graph neural networks (GNNs) have gained prominence for their ability to learn effective representations in graphs with diverse, attribute-aware relationships. While existing methods leverage attribute inference during message passing to improve performance, they often struggle with challenging heterophilic graphs. This is due to edge distribution shifts introduced by diverse connection patterns, which blur attribute distinctions and undermine message-passing stability. This paper introduces H₂OGNN, a novel framework that reframes edge attribute inference as an out-of-distribution (OOD) detection problem. H₂OGNN introduces a simple yet effective symbolic energy regularization approach for OOD learning, ensuring robust classification boundaries between homophilic and heterophilic edge attributes. This design significantly improves the stability and reliability of GNNs across diverse connectivity patterns. Through theoretical analysis, we show that H₂OGNN addresses the graph denoising problem by going beyond feature smoothing, offering deeper insights into how precise edge attribute identification boosts model performance. Extensive experiments on nine benchmark datasets demonstrate that H₂OGNN not only achieves state-of-the-art performance but also consistently outperforms other heterophilic GNN frameworks, particularly on datasets with high heterophily.

PDF Details DOI

EAAI Journal 2024 Journal Article

CSPNeXt: A new efficient token hybrid backbone

Xiangqi Chen
Chengzhuan Yang
Jiashuaizi Mo
Yaxin Sun
Hicham Karmouni
Yunliang Jiang
Zhonglong Zheng

Although the cross-stage partial network (CSPNet) model enhances the learning ability and reduces the computational effort of convolutional neural networks, while offering high flexibility and efficiency. However, the model is significantly affected by the limited perceptual field and the weak mixing of high-frequency and low-frequency features, which significantly affects the recognition performance of the model. To alleviate this problem, we propose a ”modernized” CSPNeXt model that can effectively learn the feature maps’ high- and low-frequency information and extend the perceptual field to improve the recognition performance of the CSPNet model. At the same time, the CSPNeXt model also retains the corresponding advantages of the CSPNet model. Specifically, we introduce parallel large-kernel convolution and a simple average pooling method to capture different frequency information in the image. Unlike the original CSPNet channel splitting mechanism, CSPNeXt mixer is more effective in feature fusion by introducing a new channel splitting mechanism. To obtain more high-frequency signals in the shallow layer and more low-frequency signals in the deep layer, we increase the dimension of feeding to the high-frequency mixer while expanding the dimension of providing to the low-frequency mixer in the deep layer. This mechanism efficiently captures high and low frequencies signal at different levels. We extensively test the CSPNeXt model on various vision tasks, including image classification, object detection, and instance segmentation, and the model demonstrates its excellent performance, outperforming previous CSPNet method. Our method achieves 81. 6% top-1 accuracy on Imagenet-1K, 1. 8% better than DeiT-S and slightly better than Swin-T (81. 3%) while using fewer parameters and GFLOPs.

ICML Conference 2022 Conference Paper

UAST: Uncertainty-Aware Siamese Tracking

Dawei Zhang 0002
Yanwei Fu 0001
Zhonglong Zheng

Visual object tracking is basically formulated as target classification and bounding box estimation. Recent anchor-free Siamese trackers rely on predicting the distances to four sides for efficient regression but fail to estimate accurate bounding box in complex scenes. We argue that these approaches lack a clear probabilistic explanation, so it is desirable to model the uncertainty and ambiguity representation of target estimation. To address this issue, this paper presents an Uncertainty-Aware Siamese Tracker (UAST) by developing a novel distribution-based regression formulation with localization uncertainty. We exploit regression vectors to directly represent the discretized probability distribution for four offsets of boxes, which is general, flexible and informative. Based on the resulting distributed representation, our method is able to provide a probabilistic value of uncertainty. Furthermore, considering the high correlation between the uncertainty and regression accuracy, we propose to learn a joint representation head of classification and localization quality for reliable tracking, which also avoids the inconsistency of classification and quality estimation between training and inference. Extensive experiments on several challenging tracking benchmarks demonstrate the effectiveness of UAST and its superiority over other Siamese trackers.

AAAI Conference 2021 Conference Paper

Visual Tracking via Hierarchical Deep Reinforcement Learning

Dawei Zhang
Zhonglong Zheng
Riheng Jia
Minglu Li

Visual tracking has achieved great progress due to numerous different algorithms. However, deep trackers based on classification or Siamese network still have their specific limitations. In this work, we show how to teach machines to track a generic object in videos like humans, who can use a few search steps to perform tracking. By constructing a Markov decision process in Deep Reinforcement Learning (DRL), our agents can learn to determine hierarchical decisions on tracking mode and motion estimation. To be specific, our Hierarchical DRL framework is composed of a Siamese-based observation network which models the motion information of an arbitrary target, a policy network for mode switch and an actor-critic network for box regression. This tracking strategy is more in line with human behavior paradigm, and is effective and efficient to cope with fast motion, background clutter and large deformations. Extensive experiments on the GOT- 10k, OTB-100, UAV-123, VOT and LaSOT tracking benchmarks, demonstrate that the proposed tracker achieves stateof-the-art performance while running in real-time.

EAAI Journal 2007 Journal Article

Initialization enhancer for non-negative matrix factorization

Zhonglong Zheng
Jie Yang
Yitan Zhu

Non-negative matrix factorization (NMF), proposed recently by Lee and Seung, has been applied to many areas such as dimensionality reduction, image classification image compression, and so on. Based on traditional NMF, researchers have put forward several new algorithms to improve its performance. However, particular emphasis has to be placed on the initialization of NMF because of its local convergence, although it is usually ignored in many documents. In this paper, we explore three initialization methods based on principal component analysis (PCA), fuzzy clustering and Gabor wavelets either for the consideration of computational complexity or the preservation of structure. In addition, the three methods develop an efficient way of selecting the rank of the NMF in low-dimensional space.

EAAI Journal 2006 Journal Article

Multi-view based face chin contour extraction

Xinliang Ge
Jie Yang
Zhonglong Zheng
Feng Li

Chin contour is an important facial feature to build a 3D morphable model, the core step of which is to establish feature points correspondence between each face in the training set and the reference face. In this paper, robust face detection is implemented firstly using probabilistic method. A probability of detection is obtained for each image of different position and at several scales and poses. Then, the chin contours are extracted accurately using the active shape model (ASM), which depends on the parameters obtained from the face detection. From frontal (0°) to profile (90°) faces that are equally divided into 10 parts, we train 10 flexible models. Then, different flexible models are used to extract the face chin contour according to the corresponding face pose. Experimental results show that the proposed approach can extract the chin contours of different people across different poses with good accuracy.