Author name cluster

Hongan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

1 author row

AAAI Conference 2025 Conference Paper

DiffGrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model

Yonghao Zhang
Qiang He
Yanguang Wan
Yinda Zhang
Xiaoming Deng
Cuixia Ma
Hongan Wang

Generating high-quality whole-body human object interaction motion sequences is becoming increasingly important in various fields such as animation, VR/AR, and robotics. The main challenge of this task lies in determining the level of involvement of each hand given the complex shapes of objects in different sizes and their different motion trajectories, while ensuring strong grasping realism and guaranteeing the coordination of movement in all body parts. Contrasting with existing work, which either generates human interaction motion sequences without detailed hand grasping poses or only models a static grasping pose, we propose a simple yet effective framework that jointly models the relationship between the body, hands, and the given object motion sequences within a single diffusion model. To guide our network in perceiving the object's spatial position and learning more natural grasping poses, we introduce novel contact-aware losses and incorporate a data-driven, carefully designed guidance. Experimental results demonstrate that our approach outperforms the state-of-the-art method and generates plausible results.

PDF Details DOI

AAAI Conference 2025 Conference Paper

HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Wentian Qu
Jiahe Li
Jian Cheng
Jian Shi
Chenyu Meng
Cuixia Ma
Hongan Wang
Xiaoming Deng

Understanding of bimanual hand-object interaction plays an important role in robotics and virtual reality. However, due to significant occlusions between hands and object as well as the high degree-of-freedom motions, it is challenging to collect and annotate a high-quality, large-scale dataset, which prevents further improvement of bimanual hand-object interaction-related baselines. In this work, we propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction, which is capable of augmenting existing dataset to large-scale photorealistic data with various hand-object pose and viewpoints. First, we use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used, we design a super-resolution module. Second, we extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction, which can significantly expand the pose distribution of the dataset. Third, we conduct an analysis for the impact of different aspects of the proposed data augmentation on the understanding of the bimanual hand-object interaction. We perform our data augmentation on two benchmarks, H2O and Arctic, and verify that our method can improve the performance of the baselines.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks

Jiafan Li
Jiaqi Zhu
Liang Chang
Yilin Li
Miaomiao Li
Yang Wang
Yi Yang
Hongan Wang

Nowadays, numerous online platforms can be described as multi-modal heterogeneous networks (MMHNs), such as Douban's movie networks and Amazon's product review networks. Accurately categorizing nodes within these networks is crucial for analyzing the corresponding entities, which requires effective representation learning on nodes. However, existing multi-modal fusion methods often adopt either early fusion strategies which may lose the unique characteristics of individual modalities, or late fusion approaches overlooking the cross-modal guidance in GNN-based information propagation. In this paper, we propose a novel model for node classification in MMHNs, named Heterogeneous Graph Neural Network with Inter-Modal Attention (HGNN-IMA). It learns node representations by capturing the mutual influence of multiple modalities during the information propagation process, within the framework of heterogeneous graph transformer. Specifically, a nested inter-modal attention mechanism is integrated into the inter-node attention to achieve adaptive multi-modal fusion, and modality alignment is also taken into account to encourage the propagation among nodes with consistent similarities across all modalities. Moreover, an attention loss is augmented to mitigate the impact of missing modalities. Extensive experiments validate the superiority of the model in the node classification task, providing an innovative view to handle multi-modal data, especially when accompanied with network structures. The full version including Appendix is available at http: //arxiv. org/abs/2505. 07895.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Universal Features Guided Zero-Shot Category-Level Object Pose Estimation

Wentian Qu
Chenyu Meng
Heng Li
Jian Cheng
Cuixia Ma
Hongan Wang
Xiao Zhou
Xiaoming Deng

Object pose estimation, crucial in computer vision and robotics applications, faces challenges with the diversity of unseen categories. We propose a zero-shot method to achieve category-level 6-DOF object pose estimation, which exploits both 2D and 3D universal features of input RGB-D image to establish semantic similarity-based correspondences and can be extended to unseen categories without additional model fine-tuning. Our method begins with combining efficient 2D universal features to find sparse correspondences between intra-category objects and gets initial coarse pose. To handle the correspondence degradation of 2D universal features if the pose deviates much from the target pose, we use an iterative strategy to optimize the pose. Subsequently, to resolve pose ambiguities due to shape differences between intra-category objects, the coarse pose is refined by optimizing with dense alignment constraint of 3D universal features. Our method outperforms previous methods on the REAL275 and Wild6D benchmarks for unseen categories.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

SceneDiff: Generative Scene-Level Image Retrieval with Text and Sketch Using Diffusion Models

Ran Zuo
Haoxiang Hu
Xiaoming Deng
Cangjun Gao
Zhengming Zhang
Yu-Kun Lai
Cuixia Ma
Yong-Jin Liu

Jointly using text and sketch for scene-level image retrieval utilizes the complementary between text and sketch to describe the fine-grained scene content and retrieve the target image, which plays a pivotal role in accurate image retrieval. Existing methods directly fuse the features of sketch and text and thus suffer from the bottleneck of limited utilization for crucial semantic and structural information, leading to inaccurate matching with images. In this paper, we propose SceneDiff, a novel retrieval network that leverages a pre-trained diffusion model to establish a shared generative latent space, enabling a joint latent representation learning for both sketch and text features and precise alignment with the corresponding image. Specifically, we encode text, sketch and image features, and project them into the diffusion-based share space, conditioning the denoising process on sketch and text features to generate latent fusion features, while employing the pre-trained autoencoder for latent image features. Within this space, we introduce the content-aware feature transformation module to reconcile encoded sketch and image features with the diffusion latent space's dimensional requirements and preserve their visual content information. Then we augment the representation capability of the generated latent fusion features by integrating multiple samplings with partition attention, and utilize contrastive learning to align both direct fusion features and generated latent fusion features with corresponding image representations. Our method outperforms the state-of-the-art works through extensive experiments, providing a novel insight into the related retrieval field.

PDF Details DOI

AAAI Conference 2024 Conference Paper

SpaceGTN: A Time-Agnostic Graph Transformer Network for Handwritten Diagram Recognition and Segmentation

Haoxiang Hu
Cangjun Gao
Yaokun Li
Xiaoming Deng
YuKun Lai
Cuixia Ma
Yong-Jin Liu
Hongan Wang

Online handwriting recognition is pivotal in domains like note-taking, education, healthcare, and office tasks. Existing diagram recognition algorithms mainly rely on the temporal information of strokes, resulting in a decline in recognition performance when dealing with notes that have been modified or have no temporal information. The current datasets are drawn based on templates and cannot reflect the real free-drawing situation. To address these challenges, we present SpaceGTN, a time-agnostic Graph Transformer Network, leveraging spatial integration and removing the need for temporal data. Extensive experiments on multiple datasets have demonstrated that our method consistently outperforms existing methods and achieves state-of-the-art performance. We also propose a pipeline that seamlessly connects offline and online handwritten diagrams. By integrating a stroke restoration technique with SpaceGTN, it enables intelligent editing of previously uneditable offline diagrams at the stroke level. In addition, we have also launched the first online handwritten diagram dataset, OHSD, which is collected using a free-drawing method and comes with modification annotations.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Efficient Virtual View Selection for 3D Hand Pose Estimation

Jian Cheng
Yanguang Wan
Dexin Zuo
Cuixia Ma
Jian Gu
Ping Tan
Hongan Wang
Xiaoming Deng

3D hand pose estimation from single depth is a fundamental problem in computer vision, and has wide applications. However, the existing methods still can not achieve satisfactory hand pose estimation results due to view variation and occlusion of human hand. In this paper, we propose a new virtual view selection and fusion module for 3D hand pose estimation from single depth. We propose to automatically select multiple virtual viewpoints for pose estimation and fuse the results of all and find this empirically delivers accurate and robust pose estimation. In order to select most effective virtual views for pose fusion, we evaluate the virtual views based on the confidence of virtual views using a light-weight network via network distillation. Experiments on three main benchmark datasets including NYU, ICVL and Hands2019 demonstrate that our method outperforms the state-of-the-arts on NYU and ICVL, and achieves very competitive performance on Hands2019-Task1, and our proposed virtual view selection and fusion module is both effective for 3D hand pose estimation.

PDF Details

IJCAI Conference 2020 Conference Paper

Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings

Yi Yang
Hongan Wang
Jiaqi Zhu
Yunkun Wu
Kailong Jiang
Wenli Guo
Wandong Shi

Dataless text classification has attracted increasing attentions recently. It only needs very few seed words of each category to classify documents, which is much cheaper than supervised text classification that requires massive labeling efforts. However, most of existing models pay attention to long texts, but get unsatisfactory performance on short texts, which have become increasingly popular on the Internet. In this paper, we at first propose a novel model named Seeded Biterm Topic Model (SeedBTM) extending BTM to solve the problem of dataless short text classification with seed words. It takes advantage of both word co-occurrence information in the topic model and category-word similarity from widely used word embeddings as the prior topic-in-set knowledge. Moreover, with the same approach, we also propose Seeded Twitter Biterm Topic Model (SeedTBTM), which extends Twitter-BTM and utilizes additional user information to achieve higher classification accuracy. Experimental results on five real short-text datasets show that our models outperform the state-of-the-art methods, and especially perform well when the categories are overlapping and interrelated.

PDF Details DOI

TIST Journal 2011 Journal Article

Understanding, Manipulating and Searching Hand-Drawn Concept Maps

Yingying Jiang
Feng Tian
Xiaolong (Luke) Zhang
Guozhong Dai
Hongan Wang

Concept maps are an important tool to organize, represent, and share knowledge. Building a concept map involves creating text-based concepts and specifying their relationships with line-based links. Current concept map tools usually impose specific task structures for text and link construction, and may increase cognitive burden to generate and interact with concept maps. While pen-based devices (e.g., tablet PCs) offer users more freedom in drawing concept maps with a pen or stylus more naturally, the support for hand-drawn concept map creation and manipulation is still limited, largely due to the lack of methods to recognize the components and structures of hand-drawn concept maps. This article proposes a method to understand hand-drawn concept maps. Our algorithm can extract node blocks, or concept blocks, and link blocks of a hand-drawn concept map by combining dynamic programming and graph partitioning, recognize the text content of each concept node, and build a concept-map structure by relating concepts and links. We also design an algorithm for concept map retrieval based on hand-drawn queries. With our algorithms, we introduce structure-based intelligent manipulation techniques and ink-based retrieval techniques to support the management and modification of hand-drawn concept maps. Results from our evaluation study show high structure recognition accuracy in real time of our method, and good usability of intelligent manipulation and retrieval techniques.

Details DOI