Author name cluster

Dongsheng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

AAAI Conference 2026 Conference Paper

Online Multi-Relational Clustering with Dominant View Mining

Zhengzhong Zhu
Pei Zhou
Dongsheng Wang
Li Cheng
Jiangping Zhu

Multi-relational graph clustering aims to uncover complex node interactions by leveraging multiple relational views, yet existing methods often suffer from two key limitations: they assume equal importance across views and decouple representation learning from clustering, both of which hinder overall performance. To address these issues, we propose OMC-DVM, a novel end-to-end Online Multi-Relational Graph Clustering With Dominant View Mining framework. OMC-DVM introduces two core innovations: (1) A unsupervised dominant view mining module that dynamically identifies the dominant view using Maximum Mean Discrepancy (MMD) and adaptively aligns other views to it, mitigating view imbalance. (2) An online,multi-relational clustering process that unifies representation learning and clustering into a single stage. By performing clustering-level contrastive learning, OMC-DVM directly generates cluster assignments in an end-to-end manner. Extensive experiments on both real-world and synthetic benchmark datasets demonstrate that OMC-DVM not only achieves state-of-the-art clustering performance but also effectively alleviates the view imbalance problem in multi-relational graphs.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

Miaoge Li
Jingcai Guo
Richard Yi Da Xu
Dongsheng Wang
Xiaofeng Cao
Zhijie Rao
Song Guo

Compositional Zero-Shot Learning (CZSL) aims to recognize novel state-object compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically similar multimodal representations, as well as generalizing pre-trained knowledge to novel compositional contexts, remains an enduring challenge. In this paper, our interest is to revisit the conditional transport (CT) theory and its homology to the visual-semantics interaction in CZSL and further, propose a novel Trisets Consistency Alignment framework (dubbed TsCA) that well-addresses these issues. Concretely, we utilize three distinct yet semantically homologous sets, i. e. , patches, primitives, and compositions, to construct pairwise CT costs to minimize their semantic discrepancies. To further ensure the consistency transfer within these sets, we implement a cycle-consistency constraint that refines the learning by guaranteeing the feature consistency of the self-mapping during transport flow, regardless of modality. Moreover, we extend the CT plans to an open-world setting, which enables the model to effectively filter out unfeasible pairs, thereby speeding up the inference as well as increasing the accuracy. Extensive experiments are conducted to verify the effectiveness of the proposed method. The code is available at https: //github. com/keepgoingjkg/TsCA.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation

Dongsheng Wang
Xu Jia
Yang Zhang
Xinyu Zhang
Yaoyuan Wang
Ziyang Zhang
Dong Wang
Huchuan Lu

Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Compared with frame-based sensors, event cameras have microsecond-level latency and high dynamic range, hence showing great potential for object detection under high-speed motion and poor illumination conditions. Due to sparsity and asynchronism nature with event streams, most of existing approaches resort to hand-crafted methods to convert event data into 2D grid representation. However, they are sub-optimal in aggregating information from event stream for object detection. In this work, we propose to learn an event representation optimized for event-based object detection. Specifically, event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. To fully exploit information with event streams to detect objects, a dual-memory aggregation network (DMANet) is proposed to leverage both long and short memory along event streams to aggregate effective information for object detection. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars at neighboring time intervals. Extensive experiments on the recently released event-based automotive detection dataset demonstrate the effectiveness of the proposed method.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection

Ruiying Lu
YuJie Wu
Long Tian
Dongsheng Wang
Bo Chen
Xiyang Liu
Ruimin Hu

Unsupervised image Anomaly Detection (UAD) aims to learn robust and discriminative representations of normal samples. While separate solutions per class endow expensive computation and limited generalizability, this paper focuses on building a unified framework for multiple classes. Under such a challenging setting, popular reconstruction-based networks with continuous latent representation assumption always suffer from the "identical shortcut" issue, where both normal and abnormal samples can be well recovered and difficult to distinguish. To address this pivotal issue, we propose a hierarchical vector quantized prototype-oriented Transformer under a probabilistic framework. First, instead of learning the continuous representations, we preserve the typical normal patterns as discrete iconic prototypes, and confirm the importance of Vector Quantization in preventing the model from falling into the shortcut. The vector quantized iconic prototypes are integrated into the Transformer for reconstruction, such that the abnormal data point is flipped to a normal data point. Second, we investigate an exquisite hierarchical framework to relieve the codebook collapse issue and replenish frail normal patterns. Third, a prototype-oriented optimal transport method is proposed to better regulate the prototypes and hierarchically evaluate the abnormal score. By evaluating on MVTec-AD and VisA datasets, our model surpasses the state-of-the-art alternatives and possesses good interpretability. The code is available at https: //github. com/RuiyingLu/HVQ-Trans.

PDF Details

NeurIPS Conference 2023 Conference Paper

Tuning Multi-mode Token-level Prompt Alignment across Modalities

Dongsheng Wang
Miaoge Li
Xinyang Liu
MingSheng Xu
Bo Chen
Hanwang Zhang

Advancements in prompt tuning of vision-language models have underscored their potential in enhancing open-world visual concept comprehension. However, prior works only primarily focus on single-mode (only one prompt for each modality) and holistic level (image or sentence) semantic alignment, which fails to capture the sample diversity, leading to sub-optimal prompt discovery. To address the limitation, we propose a multi-mode token-level tuning framework that leverages the optimal transportation to learn and align a set of prompt tokens across modalities. Specifically, we rely on two essential factors: 1) multi-mode prompts discovery, which guarantees diverse semantic representations, and 2) token-level alignment, which helps explore fine-grained similarity. Consequently, the similarity can be calculated as a hierarchical transportation problem between the modality-specific sets. Extensive experiments on popular image recognition benchmarks show the superior generalization and few-shot abilities of our approach. The qualitative analysis demonstrates that the learned prompt tokens have the ability to capture diverse visual concepts.

PDF Details

NeurIPS Conference 2022 Conference Paper

Alleviating "Posterior Collapse'' in Deep Topic Models via Policy Gradient

Yewen Li
Chaojie Wang
Zhibin Duan
Dongsheng Wang
Bo Chen
Bo An
Mingyuan Zhou

Deep topic models have been proven as a promising way to extract hierarchical latent representations from documents represented as high-dimensional bag-of-words vectors. However, the representation capability of existing deep topic models is still limited by the phenomenon of "posterior collapse", which has been widely criticized in deep generative models, resulting in the higher-level latent representations exhibiting similar or meaningless patterns. To this end, in this paper, we first develop a novel deep-coupling generative process for existing deep topic models, which incorporates skip connections into the generation of documents, enforcing strong links between the document and its multi-layer latent representations. After that, utilizing data augmentation techniques, we reformulate the deep-coupling generative process as a Markov decision process and develop a corresponding Policy Gradient (PG) based training algorithm, which can further alleviate the information reduction at higher layers. Extensive experiments demonstrate that our developed methods can effectively alleviate "posterior collapse" in deep topic models, contributing to providing higher-quality latent document representations.

PDF Details

NeurIPS Conference 2022 Conference Paper

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding

Yi. shi Xu
Dongsheng Wang
Bo Chen
Ruiying Lu
Zhibin Duan
Mingyuan Zhou

Embedded topic models are able to learn interpretable topics even with large and heavy-tailed vocabularies. However, they generally hold the Euclidean embedding space assumption, leading to a basic limitation in capturing hierarchical relations. To this end, we present a novel framework that introduces hyperbolic embeddings to represent words and topics. With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy among words and topics can be better exploited to mine more interpretable topics. Furthermore, due to the superiority of hyperbolic geometry in representing hierarchical data, tree-structure knowledge can also be naturally injected to guide the learning of a topic hierarchy. Therefore, we further develop a regularization term based on the idea of contrastive learning to inject prior structural knowledge efficiently. Experiments on both topic taxonomy discovery and document representation demonstrate that the proposed framework achieves improved performance against existing embedded topic models.

PDF Details

NeurIPS Conference 2022 Conference Paper

Knowledge-Aware Bayesian Deep Topic Model

Dongsheng Wang
Yi. shi Xu
Miaoge Li
Zhibin Duan
Chaojie Wang
Bo Chen
Mingyuan Zhou

We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus on mining word co-occurrence patterns, ignoring potentially easy-to-obtain prior topic hierarchies that could help enhance topic coherence. While several knowledge-based topic models have recently been proposed, they are either only applicable to shallow hierarchies or sensitive to the quality of the provided prior knowledge. To this end, we develop a novel deep ETM that jointly models the documents and the given prior knowledge by embedding the words and topics into the same space. Guided by the provided domain knowledge, the proposed model tends to discover topic hierarchies that are organized into interpretable taxonomies. Moreover, with a technique for adapting a given graph, our extended version allows the structure of the prior knowledge to be fine-tuned to match the target corpus. Extensive experiments show that our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.

PDF Details

NeurIPS Conference 2021 Conference Paper

TopicNet: Semantic Graph-Guided Topic Discovery

Zhibin Duan
Yi. shi Xu
Bo Chen
Dongsheng Wang
Chaojie Wang
Mingyuan Zhou

Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner and automatically organize them into a topic hierarchy. However, it is unclear how to incorporate prior belief such as knowledge graph to guide the learning of the topic hierarchy. To address this issue, we introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as inductive bias to influence the learning. TopicNet represents each topic as a Gaussian-distributed embedding vector, projects the topics of all layers into a shared embedding space, and explores both the symmetric and asymmetric similarities between Gaussian embedding vectors to incorporate prior semantic hierarchies. With a variational auto-encoding inference network, the model parameters are optimized by minimizing the evidence lower bound and supervised loss via stochastic gradient descent. Experiments on widely used benchmark show that TopicNet outperforms related deep topic models on discovering deeper interpretable topics and mining better document representations.

PDF Details

NeurIPS Conference 2020 Conference Paper

Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network

Chaojie Wang
Hao Zhang
Bo Chen
Dongsheng Wang
Zhengjue Wang
Mingyuan Zhou

To analyze a collection of interconnected documents, relational topic models (RTMs) have been developed to describe both the link structure and document content, exploring their underlying relationships via a single-layer latent representation with limited expressive capability. To better utilize the document network, we first propose graph Poisson factor analysis (GPFA) that constructs a probabilistic model for interconnected documents and also provides closed-form Gibbs sampling update equations, moving beyond sophisticated approximate assumptions of existing RTMs. Extending GPFA, we develop a novel hierarchical RTM named graph Poisson gamma belief network (GPGBN), and further introduce two different Weibull distribution based variational graph auto-encoders for efficient model inference and effective network information aggregation. Experimental results demonstrate that our models extract high-quality hierarchical latent document representations, leading to improved performance over baselines on various graph analytic tasks.

PDF Details

AAAI Conference 2018 Conference Paper

Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design

Zhourui Song
Zhenyu Liu
Dongsheng Wang

The heavy burdens of computation and off-chip trafﬁc impede deploying the large scale convolution neural network on embedded platforms. As CNN is attributed to the strong endurance to computation errors, employing block ﬂoating point (BFP) arithmetics in CNN accelerators could save the hardware cost and data trafﬁcs efﬁciently, while maintaining the classiﬁcation accuracy. In this paper, we verify the effects of word width deﬁnitions in BFP to the CNN performance without retraining. Several typical CNN models, including VGG16, ResNet-18, ResNet-50 and GoogLeNet, were tested in this paper. Experiments revealed that 8-bit mantissa, including sign bit, in BFP representation merely induced less than 0. 3% accuracy loss. In addition, we investigate the computational errors in theory and develop the noise-to-signal ratio (NSR) upper bound, which provides the promising guidance for BFP based CNN engine design.

PDF Details

NeurIPS Conference 2018 Conference Paper

HitNet: Hybrid Ternary Recurrent Neural Network

Peiqi Wang
Xinfeng Xie
Lei Deng
Guoqi Li
Dongsheng Wang
Yuan Xie

Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model. Overall, our HitNet can quantize RNN models into ternary values, {-1, 0, 1}, outperforming the state-of-the-art quantization methods on RNN models significantly. We test it on typical RNN models, such as Long-Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), on which the results outperform previous work significantly. For example, we improve the perplexity per word (PPW) of a ternary LSTM on Penn Tree Bank (PTB) corpus from 126 (the state-of-the-art result to the best of our knowledge) to 110. 3 with a full precision model in 97. 2, and a ternary GRU from 142 to 113. 5 with a full precision model in 102. 7.

PDF Details

TCS Journal 2008 Journal Article

A new framework for the design and analysis of identity-based identification schemes

Guomin Yang
Jing Chen
Duncan S. Wong
Xiaotie Deng
Dongsheng Wang

Details DOI

ICRA Conference 2001 Conference Paper

Characterization of the Analytical Boundary of the Workspace for 3-6 SPS Parallel Manipulator

Qizhi Wang
Dongsheng Wang
Min Tan

An analytical boundary is presented to obtain all surfaces enveloping the workspace of 3-6 SPS parallel manipulator, based on the parameter equations and the condition of parallel moving. The planform and profile of the workspace are analyzed and proved. The analytic geometry method of delineating the workspace is better than simulating as well as numerical techniques with a six degrees of freedom parallel manipulator, which can completely depict the configuration of the workspace.

Details