Arrow Research search

Author name cluster

Zhigang Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

AAAI Conference 2026 Conference Paper

Attentive Keypoint Identification: Progressive Spatiotemporal Refinement for Video-based Human Pose Estimation

  • Sifan Wu
  • Haipeng Chen
  • Yingda Lyu
  • Shaojing Fan
  • Zhigang Wang
  • Zhenguang Liu
  • Yingying Jiao

Video-based human pose estimation has vast applications such as action recognition, sports analytics, and crime detection. However, this task is challenging as it involves interpreting both spatial context and temporal dynamics to accurately localize human anatomical keypoints in video sequences. Current approaches, often based on attention mechanisms, perform well but struggle in challenging scenarios like rapid motion and pose occlusion. We attribute these failures to two fundamental limitations: spatial uniformity, where models indiscriminately assign attention to both joint-relevant features and background clutter, thereby introducing spatial noise; and temporal rigidity, an inability to adapt to large joint displacements, resulting in severe feature misalignment during rapid motion. To overcome these challenges, we introduce PSTPose, a novel progressive spatiotemporal refinement framework. Specifically, to address the spatial uniformity problem, we propose a Discriminative Feature Enhancement (DFE) module that emphasizes joint-relevant features and a Feature Cluster Grouping (FCG) module that forms compact, semantically meaningful regions. For the temporal rigidity problem, we introduce a Deformable Spatiotemporal Fusion (DSF) module that adaptively aligns features across consecutive frames via deformation-aware sampling. This design ensures robust keypoint localization, particularly in cluttered and dynamic scenes. Extensive experiments on three large-scale benchmarks, PoseTrack2017, PoseTrack2018, PoseTrack21, demonstrate that PSTPose establishes a new state-of-the-art.

AAAI Conference 2026 Conference Paper

DiffusionPose: Markov-Optimized Diffusion Model for Human Pose Estimation

  • Zhigang Wang
  • Zhenguang Liu
  • Shaojing Fan
  • Sifan Wu
  • Yingying Jiao

Video-based human pose estimation has long been a nontrivial task due to its dynamic nature and challenging detection scenarios such as occlusion and defocus. Inspired by the success of diffusion models, researchers have applied them to video pose estimation, outperforming traditional joint detection methods. However, existing diffusion model-based methods still face challenges like slow convergence and unstable pose generation. To tackle these issues, we propose DiffusionPose, a novel framework for video pose estimation that integrates diffusion models with optimization strategies: (1) We combine the emerging Mamba with Transformers to balance global and local spatio-temporal modeling. (2) We integrate Markov Random Fields into the reverse diffusion process to enhance the denoising of pose heatmaps, particularly addressing the issue of confused generation of occluded joints. (3) We mathematically formulate a Markov objective to supervise the heatmap denoising process, enabling the model to generate anatomically plausible skeletons. Our method achieves state-of-the-art performance on three large-scale benchmark datasets. Interestingly, it shows surprising robustness in challenging video scenarios, improving the accuracy of the most difficult ankle joint by 16.9% compared to the previous best diffusion model-based method on the Challenging-PoseTrack dataset.

AAAI Conference 2026 Conference Paper

Dual Coding Theory in Action: Language-Assisted Human Pose Estimation in Videos

  • Sifan Wu
  • Haipeng Chen
  • Yingda Lyu
  • Shaojing Fan
  • Zhigang Wang
  • Zhenguang Liu
  • Yingying Jiao

Video-based human pose estimation aims to localize keypoints across frames, enabling robust analysis of human motion in applications such as sports, surveillance, and healthcare. However, existing methods rely solely on visual cues, limiting their robustness in complex scenes involving occlusion, motion blur, or poor lighting. In contrast, dual coding theory from psychology suggests that human cognition is inherently multimodal: we learn by integrating visual perception with linguistic context to form structured, semantic understandings of the world. Visual input provides concrete spatiotemporal grounding, while language offers symbolic abstraction that enhances reasoning and generalization. Motivated by this cognitive principle, we present the first framework that explicitly incorporates language as an auxiliary modality to enhance video-based pose estimation. To address the lack of paired video-text datasets, we first employ a Multimodal Large Language Model (MLLM) to generate textual descriptions of human interactions from videos. We then propose a novel coarse-to-fine multimodal alignment pipeline: a cross-modal semantic interaction module establishes initial grounding between spatiotemporal visual features and textual embeddings, while an optimal transport-based feature matching mechanism enforces fine-grained, geometry-aware alignment. This cognitively inspired design enables more accurate and robust pose estimation, especially in visually challenging scenes like occlusion and motion blur. Extensive experiments on three benchmarks confirm that our method consistently outperforms state-of-the-art approaches.

AAAI Conference 2026 Conference Paper

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models

  • Linglin Jing
  • Yuting Gao
  • Zhigang Wang
  • Wang Lan
  • Yiwen Tang
  • Weiyun Wang
  • Wenhai Wang
  • Qingpei Guo

Recent advancements have shown that the Mixture of Experts (MoE) approach significantly enhances the capacity of large language models (LLMs) and improves performance on downstream tasks. Building on these promising results, multi-modal large language models (MLLMs) have increasingly adopted MoE techniques. However, existing multi-modal MoE tuning methods typically face two key challenges: expert uniformity and router rigidity. Expert uniformity occurs because MoE experts are often initialized by simply replicating the FFN parameters from LLMs, leading to homogenized expert functions and weakening the intended diversification of the MoE architecture. Meanwhile, router rigidity stems from the prevalent use of static linear routers for expert selection, which fail to distinguish between visual and textual tokens, resulting in similar expert distributions for image and text. To address these limitations, we propose EvoMoE, an innovative MoE tuning framework. EvoMoE introduces a meticulously designed expert initialization strategy that progressively evolves multiple robust experts from a single trainable expert, a process termed expert evolution that specifically targets severe expert homogenization. Furthermore, we introduce the Dynamic Token-aware Router (DTR), a novel routing mechanism that allocates input tokens to appropriate experts based on their modality and intrinsic token values. This dynamic routing is facilitated by hypernetworks, which dynamically generate routing weights tailored for each individual token. Extensive experiments demonstrate that EvoMoE significantly outperforms other sparse MLLMs across a variety of multi-modal benchmarks, including MME, MMBench, TextVQA, and POPE. Our results highlight the effectiveness of EvoMoE in enhancing the performance of MLLMs by addressing the critical issues of expert uniformity and router rigidity.

AAAI Conference 2025 Conference Paper

Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation

  • Haipeng Chen
  • Sifan Wu
  • Zhigang Wang
  • Yifang Yin
  • Yingying Jiao
  • Yingda Lyu
  • Zhenguang Liu

Video-based human pose estimation has long been a fundamental yet challenging problem in computer vision. Previous studies focus on spatio-temporal modeling through the enhancement of architecture design and optimization strategies. However, they overlook the causal relationships in the joints, leading to models that may be overly tailored and thus estimate poorly to challenging scenes. Therefore, adequate causal reasoning capability, coupled with good interpretability of model, are both indispensable and prerequisite for achieving reliable results. In this paper, we pioneer a causal perspective on pose estimation and introduce a causal-inspired multitask learning framework, consisting of two stages. In the first stage, we try to endow the model with causal spatio-temporal modeling ability by introducing two self-supervision auxiliary tasks. Specifically, these auxiliary tasks enable the network to infer challenging keypoints based on observed keypoint information, thereby imbuing causal reasoning capabilities into the model and making it robust to challenging scenes. In the second stage, we argue that not all feature tokens contribute equally to pose estimation. Prioritizing causal (keypoint-relevant) tokens is crucial to achieve reliable results, which could improve the interpretability of the model. To this end, we propose a Token Causal Importance Selection module to identify the causal tokens and non-causal tokens (e.g., background and objects). Additionally, non-causal tokens could provide potentially beneficial cues but may be redundant. We further introduce a non-causal tokens clustering module to merge the similar non-causal tokens. Extensive experiments show that our method outperforms state-of-the-art methods on three large-scale benchmark datasets.

ECAI Conference 2025 Conference Paper

HPA-FedGAN: Federated Generative Adversarial Network Based on Hierarchical Prototype Alignment

  • Zhigang Wang
  • Xinhao Wang
  • Shihao Yan
  • Junfeng Zhao 0005

Generative Adversarial Networks (GANs) have achieved remarkable success in data synthesis tasks, but centralized training methods inherently pose risks of sensitive data exposure. Federated Generative Adversarial Networks (FedGANs) provide a privacy-preserving solution by enabling collaborative model training across distributed clients without exchanging raw data. However, existing FedGAN frameworks face significant challenges in practical scenarios involving non-independent and identically distributed (non-IID) client data and heterogeneous model architectures, often leading to degraded generation quality, mode collapse, and potential privacy risks. To address these issues, we propose HPA-FedGAN, a FedGAN framework leveraging hierarchical prototype alignment. Instead of directly aggregating model parameters, HPA-FedGAN abstracts local features into multi-granularity prototypes, which are aggregated on the server to form global prototypes. Clients then hierarchically align their local prototypes with the global ones, guiding local models toward consistent approximation of the global data distribution. This design enhances generation quality and diversity while improving privacy protection through feature abstraction. Experimental results demonstrate that in complex scenarios where model heterogeneity and Non-IID data coexist, the HPA-FedGAN framework achieves significant performance improvements over state-of-the-art methods.

AAAI Conference 2025 Conference Paper

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

  • Xianqiang Gao
  • Pingrui Zhang
  • Delin Qu
  • Dong Wang
  • Zhigang Wang
  • Yan Ding
  • Bin Zhao

3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (IAM) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (ADM) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. Besides, the Multi-Image and Point Affordance (MIPA) benchmark is constructed and our method outperforms existing state-of-the-art methods on various experimental comparisons.

AAAI Conference 2025 Conference Paper

Optimizing Human Pose Estimation Through Focused Human and Joint Regions

  • Yingying Jiao
  • Zhigang Wang
  • Zhenguang Liu
  • Shaojing Fan
  • Sifan Wu
  • Zheqi Wu
  • Zhuoyue Xu

Human pose estimation has given rise to a broad spectrum of novel and compelling applications, including action recognition, sports analysis, as well as surveillance. However, accurate video pose estimation remains an open challenge. One aspect that has been overlooked so far is that existing methods learn motion clues from all pixels rather than focusing on the target human body, making them easily misled and disrupted by unimportant information such as background changes or movements of other people. Additionally, while the current Transformer-based pose estimation methods has demonstrated impressive performance with global modeling, they struggle with local context perception and precise positional identification. In this paper, we try to tackle these challenges from three aspects: (1) We propose a bilayer Human-Keypoint Mask module that performs coarse-to-fine visual token refinement, which gradually zooms in on the target human body and keypoints while masking out unimportant figure regions. (2) We further introduce a novel deformable cross attention mechanism and a bidirectional separation strategy to adaptively aggregate spatial and temporal motion clues from constrained surrounding contexts. (3) We mathematically formulate the deformable cross attention, constraining that the model focuses solely on the regions centered at the target person body. Empirically, our method achieves state-of-the-art performance on three large-scale benchmark datasets. A remarkable highlight is that our method achieves an 84.8 mean Average Precision (mAP) on the challenging wrist joint, which significantly outperforms the 81.5 mAP achieved by the current state-of-the-art method on the PoseTrack2017 dataset.

ECAI Conference 2025 Conference Paper

PFL-IDGAN: Personalized Federated Learning Framework Based on Interactive Dual Generative Adversarial Networks

  • Zhigang Wang
  • Yan Yang
  • Xiaochi Hou
  • Junfeng Zhao 0005

Federated learning (FL) enables collaborative model training without direct data exchange, promoting privacy-preserving data utilization. To address performance degradation caused by non-independent and identically distributed (non-IID) data, Personalized Federated Learning (PFL) allows each client to learn a model tailored to its local distribution. However, real-world personalized scenarios often involve not only data heterogeneity but also model heterogeneity across clients. Existing PFL methods struggle under the coexistence of both, as parameter aggregation requires identical model structures, while knowledge distillation often relies on shared public data. To tackle these challenges, we propose a novel PFL framework called Personalized Federated Learning based on Interactive Dual Generative Adversarial Networks (PFL-IDGAN). This framework leverages Generative Adversarial Networks (GANs) to augment local datasets, effectively mitigating label discrepancies and non-iid. data issues across clients. Moreover, it introduces a dual adversarial learning mechanism that enables fine-grained knowledge transfer and collaboration across clients, while supporting heterogeneous model architectures. Extensive experiments demonstrate that the proposed PFL-IDGAN framework significantly outperforms existing baseline methods, particularly in settings with pronounced disparities in client models and data distributions.

AAAI Conference 2025 Conference Paper

SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos

  • Yingying Jiao
  • Zhigang Wang
  • Sifan Wu
  • Shaojing Fan
  • Zhenguang Liu
  • Zhuoyue Xu
  • Zheqi Wu

Human pose estimation in videos remains a challenge, largely due to the reliance on extensive manual annotation of large datasets, which is expensive and labor-intensive. Furthermore, existing approaches often struggle to capture long-range temporal dependencies and overlook the complementary relationship between temporal pose heatmaps and visual features. To address these limitations, we introduce STDPose, a novel framework that enhances human pose estimation by learning spatiotemporal dynamics in sparsely-labeled videos. STDPose incorporates two key innovations: 1) A novel Dynamic-Aware Mask to capture long-range motion context, allowing for a nuanced understanding of pose changes. 2) A system for encoding and aggregating spatiotemporal representations and motion dynamics to effectively model spatiotemporal relationships, improving the accuracy and robustness of pose estimation. STDPose establishes a new performance benchmark for both video pose propagation (i.e., propagating pose annotations from labeled frames to unlabeled frames) and pose estimation tasks, across three large-scale evaluation datasets. Additionally, utilizing pseudo-labels generated by pose propagation, STDPose achieves competitive performance with only 26.7% labeled data.

AAAI Conference 2024 Conference Paper

Color Event Enhanced Single-Exposure HDR Imaging

  • Mengyao Cui
  • Zhigang Wang
  • Dong Wang
  • Bin Zhao
  • Xuelong Li

Single-exposure high dynamic range (HDR) imaging aims to reconstruct the wide-range intensities of a scene by using its single low dynamic range (LDR) image, thus providing significant efficiency. Existing methods pay high attention to restoring the luminance by inversing the tone-mapping process, while the color in the over-/under-exposed area cannot be well restored due to the information loss of the single LDR image. To address this issue, we introduce color events into the imaging pipeline, which record asynchronous pixel-wise color changes in a high dynamic range, enabling edge-like scene perception under challenging lighting conditions. Specifically, we propose a joint framework that incorporates color events and a single LDR image to restore both content and color of an HDR image, where an exposureaware transformer (EaT) module is designed to propagate the informative hints, provided by the normal-exposed LDR regions and the event streams, to the missing areas. In this module, an exposure-aware mask is estimated to suppress distractive information and strengthen the restoration of the over-/under-exposed regions. To our knowledge, we are the first to use color events to enhance single-exposure HDR imaging. We also contribute corresponding datasets, consisting of synthesized datasets and a real-world dataset collected by a DAVIS346-color camera. The datasets can be found at https://www.kaggle.com/datasets/mengyaocui/ce-hdr. Extensive experiments demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2024 Conference Paper

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Control and Rendering

  • Delin Qu
  • Qizhi Chen
  • Pingrui Zhang
  • Xianqiang Gao
  • Bin Zhao
  • Zhigang Wang
  • Dong Wang
  • Xuelong Li

This paper scales object-level reconstruction to complex scenes, advancing interactive scene reconstruction. We introduce two datasets, OmniSim and InterReal, featuring 28 scenes with multiple interactive objects. To tackle the challenge of inaccurate interactive motion recovery in complex scenes, we propose LiveScene, a scene-level language-embedded interactive radiance field that efficiently reconstructs and controls multiple objects. By decomposing the interactive scene into local deformable fields, LiveScene enables separate reconstruction of individual object motions, reducing memory consumption. Additionally, our interaction-aware language embedding localizes individual interactive objects, allowing for arbitrary control using natural language. Our approach demonstrates significant superiority in novel view synthesis, interactive scene control, and language grounding performance through extensive experiments. Project page: https: //livescenes. github. io.

AAAI Conference 2024 Conference Paper

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

  • Yiwen Tang
  • Ray Zhang
  • Zoey Guo
  • Xianzheng Ma
  • Bin Zhao
  • Zhigang Wang
  • Dong Wang
  • Xuelong Li

The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code is released at https://github.com/Ivan-Tang-3D/Point-PEFT.

IROS Conference 2024 Conference Paper

Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)

  • Yanlong Peng
  • Zhigang Wang
  • Yisheng Zhang
  • Shengmin Zhang
  • Nan Cai
  • Fan Wu
  • Ming Chen

The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disassembly AMMR(BEAM-1) system based on NeuralSymbolic AI. It detects the environmental state by leveraging a combination of multi-sensors and neural predicates and then translates this information into a quasi-symbolic space. In real-time, it identifies the optimal sequence of action primitives through LLM-heuristic tree search, ensuring high-precision execution of these primitives. Additionally, it employs positional speculative sampling using intuitive networks and achieves the disassembly of various bolt types with a meticulously designed end-effector. Importantly, BEAM-1 is a continuously learning embodied intelligence system capable of subjective reasoning like a human, and possessing intuition. A large number of real scene experiments have proved that it can autonomously perceive, decide, and execute to complete the continuous disassembly of bolts in multiple, multi-category, and complex situations, with a success rate of 98. 78%. This research attempts to use NeuroSymbolic AI to give robots real autonomous reasoning, planning, and learning capabilities. BEAM-1 realizes the revolution of battery disassembly. Its framework can be easily ported to any robotic system to realize different application scenarios, which provides a ground-breaking idea for the design and implementation of future embodied intelligent robotic systems.

AAAI Conference 2024 Conference Paper

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer

  • Linglin Jing
  • Ying Xue
  • Xu Yan
  • Chaoda Zheng
  • Dong Wang
  • Ruimao Zhang
  • Zhigang Wang
  • Hui Fang

The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences. However, it remains a challenging task due to the sparsity and lack of texture in point clouds. Moreover, the irregularity of point cloud poses a difficulty in aligning temporal information within video sequences. To address these issues, we propose a novel cross-modal knowledge transfer framework, called X4D-SceneFormer. This framework enhances 4D-Scene understanding by transferring texture priors from RGB sequences using a Transformer architecture with temporal relationship mining. Specifically, the framework is designed with a dual-branch architecture, consisting of an 4D point cloud transformer and a Gradient-aware Image Transformer (GIT). The GIT combines visual texture and temporal correlation features to offer rich semantics and dynamics for better point cloud representation. During training, we employ multiple knowledge transfer techniques, including temporal consistency losses and masked self-attention, to strengthen the knowledge transfer between modalities. This leads to enhanced performance during inference using single-modal 4D point cloud inputs. Extensive experiments demonstrate the superior performance of our framework on various 4D point cloud video understanding tasks, including action recognition, action segmentation and semantic segmentation. The results achieve 1st places, i.e., 85.3% (+7.9%) accuracy and 47.3% (+5.0%) mIoU for 4D action segmentation and semantic segmentation, on the HOI4D challenge, outperforming previous state-of-the-art by a large margin. We release the code at https://github.com/jinglinglingling/X4D.

IJCAI Conference 2022 Conference Paper

Self-Guided Hard Negative Generation for Unsupervised Person Re-Identification

  • Dongdong Li
  • Zhigang Wang
  • Jian Wang
  • Xinyu Zhang
  • Errui Ding
  • Jingdong Wang
  • Zhaoxiang Zhang

Recent unsupervised person re-identification (reID) methods mostly apply pseudo labels from clustering algorithms as supervision signals. Despite great success, this fashion is very likely to aggregate different identities with similar appearances into the same cluster. In result, the hard negative samples, playing important role in training reID models, are significantly reduced. To alleviate this problem, we propose a self-guided hard negative generation method for unsupervised person re-ID. Specifically, a joint framework is developed which incorporates a hard negative generation network (HNGN) and a re-ID network. To continuously generate harder negative samples to provide effective supervisions in the contrastive learning, the two networks are alternately trained in an adversarial manner to improve each other, where the reID network guides HNGN to generate challenging data and HNGN enforces the re-ID network to enhance discrimination ability. During inference, the performance of re-ID network is improved without introducing any extra parameters. Extensive experiments demonstrate that the proposed method significantly outperforms a strong baseline and also achieves better results than state-of-the-art methods.

ICRA Conference 2020 Conference Paper

Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM

  • Xuesong Shi
  • Dongjiang Li
  • Pengpeng Zhao 0005
  • Qinbin Tian
  • Yuxin Tian
  • Qiwei Long
  • Chunhao Zhu
  • Jingwei Song

Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Mapping (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight scene changes caused by both natural factors and human activities. For example, in home scenarios, most objects may be movable, replaceable or deformable, and the visual features of the same place may be significantly different in some successive days. Such out-of-sight dynamics pose great challenges to the robustness of pose estimation, and hence a robot’s long-term deployment and operation. To differentiate the forementioned problem from the conventional works which are usually evaluated in a static setting in a single run, the term lifelong SLAM is used here to address SLAM problems in an ever-changing environment over a long period of time. To accelerate lifelong SLAM research, we release the OpenLORIS-Scene datasets. The data are collected in real-world indoor scenes, for multiple times in each place to include scene changes in real life. We also design benchmarking metrics for lifelong SLAM, with which the robustness and accuracy of pose estimation are evaluated separately. The datasets and benchmark are available online at lifelong-robotic-vision.github.io/dataset/scene.

AAAI Conference 2019 Conference Paper

Triple Classification Using Regions and Fine-Grained Entity Typing

  • Tiansi Dong
  • Zhigang Wang
  • Juanzi Li
  • Christian Bauckhage
  • Armin B. Cremers

A Triple in knowledge-graph takes a form that consists of head, relation, tail. Triple Classification is used to determine the truth value of an unknown Triple. This is a hard task for 1-to-N relations using the vector-based embedding approach. We propose a new region-based embedding approach using fine-grained type chains. A novel geometric process is presented to extend the vectors of pre-trained entities into n-balls (n-dimensional balls) under the condition that head balls shall contain their tail balls. Our algorithm achieves zero energy cost, therefore, serves as a case study of perfectly imposing tree structures into vector space. An unknown Triple (h, r, x) will be predicted as true, when x’s n-ball is located in the r-subspace of h’s n-ball, following the same construction of known tails of h. The experiments are based on large datasets derived from the benchmark datasets WN11, FB13, and WN18. Our results show that the performance of the new method is related to the length of the type chain and the quality of pre-trained entityembeddings, and that performances of long chains with welltrained entity-embeddings outperform other methods in the literature. Source codes and datasets are located at https: //github. com/GnodIsNait/mushroom.

IJCAI Conference 2016 Conference Paper

Text-Enhanced Representation Learning for Knowledge Graph

  • Zhigang Wang
  • Juanzi Li

Learning the representations of a knowledge graph has attracted significant research interest in the field of intelligent Web. By regarding each relation as one translation from head entity to tail entity, translation-based methods including TransE, TransH and TransR are simple, effective and achieving the state-of-the-art performance. However, they still suffer the following issues: (i) low performance when modeling 1-to-N, N-to-1 and N-to-N relations. (ii) limited performance due to the structure sparseness of the knowledge graph. In this paper, we propose a novel knowledge graph representation learning method by taking advantage of the rich context information in a text corpus. The rich textual context information is incorporated to expand the semantic structure of the knowledge graph and each relation is enabled to own different representations for different head and tail entities to better handle 1-to-N, N-to-1 and N-to-N relations. Experiments on multiple benchmark datasets show that our proposed method successfully addresses the above issues and significantly outperforms the state-of-the-art methods.

AAAI Conference 2014 Conference Paper

Cross-Lingual Knowledge Validation Based Taxonomy Derivation from Heterogeneous Online Wikis

  • Zhigang Wang
  • Juanzi Li
  • Shuangjie Li
  • Mingyang Li
  • Jie Tang
  • Kuo Zhang
  • Kun Zhang

Creating knowledge bases based on the crowd-sourced wikis, like Wikipedia, has attracted significant research interest in the field of intelligent Web. However, the derived taxonomies usually contain many mistakenly imported taxonomic relations due to the difference between the user-generated subsumption relations and the semantic taxonomic relations. Current approaches to solving the problem still suffer the following issues: (i) the heuristic-based methods strongly rely on specific language dependent rules. (ii) the corpus-based methods depend on a large-scale high-quality corpus, which is often unavailable. In this paper, we formulate the cross-lingual taxonomy derivation problem as the problem of cross-lingual taxonomic relation prediction. We investigate different linguistic heuristics and language independent features, and propose a cross-lingual knowledge validation based dynamic adaptive boosting model to iteratively reinforce the performance of taxonomic relation prediction. The proposed approach successfully overcome the above issues, and experiments show that our approach significantly outperforms the designed state-of-the-art comparison methods.