Author name cluster

Weiqing Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

ICML Conference 2025 Conference Paper

Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems

Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang 0012
Nithin Rao Koluguri
Krishna C. Puvvada
Jagadeesh Balam

Sortformer is an encoder-based speaker diarization model designed for supervising speaker tagging in speech-to-text models. Instead of relying solely on permutation invariant loss (PIL), Sortformer introduces Sort Loss to resolve the permutation problem, either independently or in tandem with PIL. In addition, we propose a streamlined multi-speaker speech-to-text architecture that leverages Sortformer for speaker supervision, embedding speaker labels into the encoder using sinusoidal kernel functions. This design addresses the speaker permutation problem through sorted objectives, effectively bridging timestamps and tokens to supervise speaker labels in the output transcriptions. Experiments demonstrate that Sort Loss can boost speaker diarization performance, and incorporating the speaker supervision from Sortformer improves multi-speaker transcription accuracy. We anticipate that the proposed Sortformer and multi-speaker architecture will enable the seamless integration of speaker tagging capabilities into foundational speech-to-text systems and multimodal large language models (LLMs), offering an easily adoptable and user-friendly mechanism to enhance their versatility and performance in speaker-aware tasks. The code and trained models are made publicly available through the NVIDIA NeMo Framework.

Details

AAAI Conference 2023 Conference Paper

DNG: Taxonomy Expansion by Exploring the Intrinsic Directed Structure on Non-gaussian Space

Songlin Zhai
Weiqing Wang
Yuanfang Li
Yuan Meng

Taxonomy expansion is the process of incorporating a large number of additional nodes (i.e., ''queries'') into an existing taxonomy (i.e., ''seed''), with the most important step being the selection of appropriate positions for each query. Enormous efforts have been made by exploring the seed's structure. However, existing approaches are deficient in their mining of structural information in two ways: poor modeling of the hierarchical semantics and failure to capture directionality of the is-a relation. This paper seeks to address these issues by explicitly denoting each node as the combination of inherited feature (i.e., structural part) and incremental feature (i.e., supplementary part). Specifically, the inherited feature originates from ''parent'' nodes and is weighted by an inheritance factor. With this node representation, the hierarchy of semantics in taxonomies (i.e., the inheritance and accumulation of features from ''parent'' to ''child'') could be embodied. Additionally, based on this representation, the directionality of the is-a relation could be easily translated into the irreversible inheritance of features. Inspired by the Darmois-Skitovich Theorem, we implement this irreversibility by a non-Gaussian constraint on the supplementary feature. A log-likelihood learning objective is further utilized to optimize the proposed model (dubbed DNG), whereby the required non-Gaussianity is also theoretically ensured. Extensive experimental results on two real-world datasets verify the superiority of DNG relative to several strong baselines.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Newton–Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems

Lingbing Guo
Weiqing Wang
Zhuo Chen
Ningyu Zhang
Zequn Sun
Yixuan Lai
Qiang Zhang
Huajun Chen

Reasoning system dynamics is one of the most important analytical approaches for many scientific studies. With the initial state of a system as input, the recent graph neural networks (GNNs)-based methods are capable of predicting the future state distant in time with high accuracy. Although these methods have diverse designs in modeling the coordinates and interacting forces of the system, we show that they actually share a common paradigm that learns the integration of the velocity over the interval between the initial and terminal coordinates. However, their integrand is constant w. r. t. time. Inspired by this observation, we propose a new approach to predict the integration based on several velocity estimations with Newton–Cotes formulas and prove its effectiveness theoretically. Extensive experiments on several benchmarks empirically demonstrate consistent and significant improvement compared with the state-of-the-art methods.

PDF Details

TIST Journal 2018 Journal Article

TPM

Weiqing Wang
Hongzhi Yin
Xingzhong Du
Quoc Viet Hung Nguyen
Xiaofang Zhou

With the rapid development of location-based social networks (LBSNs), spatial item recommendation has become an important way of helping users discover interesting locations to increase their engagement with location-based services. The availability of spatial, temporal, and social information in LBSNs offers an unprecedented opportunity to enhance the spatial item recommendation. Many previous works studied spatial and social influences on spatial item recommendation in LBSNs. Due to the strong correlations between a user’s check-in time and the corresponding check-in location, which include the sequential influence and temporal cyclic effect, it is essential for spatial item recommender system to exploit the temporal effect to improve the recommendation accuracy. Leveraging temporal information in spatial item recommendation is, however, very challenging, considering (1) when integrating sequential influences, users’ check-in data in LBSNs has a low sampling rate in both space and time, which renders existing location prediction techniques on GPS trajectories ineffective, and the prediction space is extremely large, with millions of distinct locations as the next prediction target, which impedes the application of classical Markov chain models; (2) there are various temporal cyclic patterns (i.e., daily, weekly, and monthly) in LBSNs, but existing work is limited to one specific pattern; and (3) there is no existing framework that unifies users’ personal interests, temporal cyclic patterns, and the sequential influence of recently visited locations in a principled manner. In light of the above challenges, we propose a Temporal Personalized Model ( TPM ), which introduces a novel latent variable topic-region to model and fuse sequential influence, cyclic patterns with personal interests in the latent and exponential space. The advantages of modeling the temporal effect at the topic-region level include a significantly reduced prediction space, an effective alleviation of data sparsity, and a direct expression of the semantic meaning of users’ spatial activities. Moreover, we introduce two methods to model the effect of various cyclic patterns. The first method is a time indexing scheme that encodes the effect of various cyclic patterns into a binary code. However, the indexing scheme faces the data sparsity problem in each time slice. To deal with this data sparsity problem, the second method slices the time according to each cyclic pattern separately and explores these patterns in a joint additive model. Furthermore, we design an asymmetric Locality Sensitive Hashing (ALSH) technique to speed up the online top- k recommendation process by extending the traditional LSH. We evaluate the performance of TPM on two real datasets and one large-scale synthetic dataset. The performance of TPM in recommending cold-start items is also evaluated. The results demonstrate a significant improvement in TPM’s ability to recommend spatial items, in terms of both effectiveness and efficiency, compared with the state-of-the-art methods.

Details DOI

TIST Journal 2017 Journal Article

ST-SAGE

Weiqing Wang
Hongzhi Yin
Ling Chen
Yizhou Sun
Shazia Sadiq
Xiaofang Zhou

With the rapid development of location-based social networks (LBSNs), spatial item recommendation has become an important mobile application, especially when users travel away from home. However, this type of recommendation is very challenging compared to traditional recommender systems. A user may visit only a limited number of spatial items, leading to a very sparse user-item matrix. This matrix becomes even sparser when the user travels to a distant place, as most of the items visited by a user are usually located within a short distance from the user’s home. Moreover, user interests and behavior patterns may vary dramatically across different time and geographical regions. In light of this, we propose ST-SAGE, a spatial-temporal sparse additive generative model for spatial item recommendation in this article. ST-SAGE considers both personal interests of the users and the preferences of the crowd in the target region at the given time by exploiting both the co-occurrence patterns and content of spatial items. To further alleviate the data-sparsity issue, ST-SAGE exploits the geographical correlation by smoothing the crowd’s preferences over a well-designed spatial index structure called the spatial pyramid. To speed up the training process of ST-SAGE, we implement a parallel version of the model inference algorithm on the GraphLab framework. We conduct extensive experiments; the experimental results clearly demonstrate that ST-SAGE outperforms the state-of-the-art recommender systems in terms of recommendation effectiveness, model training efficiency, and online recommendation efficiency.

Details DOI