Author name cluster

Honggang Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

ICRA Conference 2025 Conference Paper

Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving

Yunshen Wang
Yicheng Liu
Tianyuan Yuan
Yucheng Mao
Yingshi Liang
Xiuyu Yang
Honggang Zhang
Hang Zhao 0021

Accurately predicting 3D occupancy grids from visual inputs is critical for autonomous driving, but current discriminative methods struggle with noisy data, incomplete observations, and the complex structures inherent in 3D scenes. In this work, we reframe 3D occupancy prediction as a generative modeling task using diffusion models, which learn the underlying data distribution and incorporate 3D scene priors. This approach enhances prediction consistency, noise robustness, and better handles the intricacies of 3D spatial structures. Our extensive experiments show that diffusion-based generative models outperform state-of-the-art discriminative approaches, delivering more realistic and accurate occupancy predictions, especially in occluded or low-visibility regions. Moreover, the improved predictions significantly benefit downstream planning tasks, highlighting the practical advantages of our method for real-world autonomous driving applications.

Details

AAAI Conference 2025 Conference Paper

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Zhipeng Chen
Lan Yang
Yonggang Qi
Honggang Zhang
Kaiyue Pang
Ke Li
Yi-Zhe Song

Despite the rapid advancements in text-to-image (T2I) synthesis, enabling precise visual control remains a significant challenge. Existing works attempted to incorporate multi-facet controls (text and sketch), aiming to enhance the creative control over generated images. However, our pilot study reveals that the expressive power of humans far surpasses the capabilities of current methods. Users desire a more versatile approach that can accommodate their diverse creative intents, ranging from controlling individual subjects to manipulating the entire scene composition. We present VersaGen, a generative AI agent that enables versatile visual control in T2I synthesis. VersaGen admits four types of visual controls: i) single visual subject; ii) multiple visual subjects; iii) scene background; iv) any combination of the three above or merely no control at all. We train an adaptor upon a frozen T2I model to accommodate the visual information into the text-dominated diffusion process. We introduce three optimization strategies during the inference phase of VersaGen to improve generation results and enhance user experience. Comprehensive experiments on COCO and Sketchy validate the effectiveness and flexibility of VersaGen, as evidenced by both qualitative and quantitative results.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

Luya Wang
Feng Liang
Yangguang Li
Honggang Zhang
Wanli Ouyang
Jing Shao

Recently, self-supervised vision transformers have attracted unprecedented attention for their impressive representation learning ability. However, the dominant method, contrastive learning, mainly relies on an instance discrimination pretext task, which learns a global understanding of the image. This paper incorporates local feature learning into self-supervised vision transformers via Reconstructive Pre-training (RePre). Our RePre extends contrastive frameworks by adding a branch for reconstructing raw image pixels in parallel with the existing contrastive objective. RePre equips with a lightweight convolution-based decoder that fuses the multi-hierarchy features from the transformer encoder. The multi-hierarchy features provide rich supervisions from low to high semantic information, crucial for our RePre. Our RePre brings decent improvements on various contrastive frameworks with different vision transformer architectures. Transfer performance in downstream tasks outperforms supervised pre-training and state-of-the-art (SOTA) self-supervised counterparts.

PDF Details DOI

NeurIPS Conference 2018 Conference Paper

Deep Attentive Tracking via Reciprocative Learning

Shi Pu
Yibing Song
Chao Ma
Honggang Zhang
Ming-Hsuan Yang

Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selectively paying attention to temporal robust features. Existing tracking-by-detection approaches mainly use additional attention modules to generate feature weights as the classifiers are not equipped with such mechanisms. In this paper, we propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers. The proposed algorithm consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training. The deep classifier learns to attend to the regions of target objects robust to appearance changes. Extensive experiments on large-scale benchmark datasets show that the proposed attentive tracking method performs favorably against the state-of-the-art approaches.

PDF Details

AAAI Conference 2015 Conference Paper

VecLP: A Realtime Video Recommendation System for Live TV Programs

Sheng Gao
Dai Zhang
Honggang Zhang
Chao Huang
Yongsheng Zhang
Jianxin Liao
Jun Guo

We propose VecLP, a novel Internet Video recommendation system working for Live TV Programs in this paper. Given little information on the live TV programs, our proposed VecLP system can effectively collect necessary information on both the programs and the subscribers as well as a large volume of related online videos, and then recommend the relevant Internet videos to the subscribers. For that, the key frames are firstly detected from the live TV programs, and then visual and textual features are extracted from these frames to enhance the understanding of the TV broadcasts. Furthermore, by utilizing the subscribers’ profiles and their social relationships, a user preference model is constructed, which greatly improves the diversity of the recommendations in our system. The subscriber’s browsing history is also recorded and used to make a further personalized recommendation. This work also illustrates how our proposed VecLP system makes it happen. Finally, we dispose some sort of new recommendation strategies in use at the system to meet special needs from diverse live TV programs and throw light upon how to fuse these strategies.

PDF Details

AAAI Conference 2013 Conference Paper

A Maximum K-Min Approach for Classification

Mingzhi Dong
Liang Yin
Weihong Deng
Li Shang
Jun Guo
Honggang Zhang

In this paper, a general Maximum K-Min approach for classiﬁcation is proposed. With the physical meaning of optimizing the classiﬁcation conﬁdence of the K worst instances, Maximum K-Min Gain/Minimum K- Max Loss (MKM) criterion is introduced. To make the original optimization problem with combinational number of constraints computationally tractable, the optimization techniques are adopted and a general compact representation lemma for MKM Criterion is summarized. Based on the lemma, a Nonlinear Maximum K- Min (NMKM) classiﬁer and a Semi-supervised Maximum K-Min (SMKM) classiﬁer are presented for traditional classiﬁcation task and semi-supervised classi- ﬁcation task respectively. Based on the experiment results of publicly available datasets, our Maximum K- Min methods have achieved competitive performance when comparing against Hinge Loss classiﬁers.

PDF Details