Arrow Research search

Author name cluster

Quan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
1 author row

Possible papers

21

JBHI Journal 2026 Journal Article

$\text{P}^\text{2}$RS: A Quantitative Rating Scale for Pain Assessment based on Pulse Wave Characterization

  • Yue He
  • Yi Sun
  • Ke Sun
  • Wei Bin
  • Quan Wang
  • Heng Yang
  • Xinxin Li

For pain intensity assessment, currently there are mainly 11 rating scales, from primitive Visual Analog Scale (VAS) to elaborate Measure of Intermittent and Constant Osteoarthritis Pain (ICOAP). However, they all depend on a self-report mechanism, making their results so subjective that the consistency, comparability and reference value are barely satisfactory. Inspired by the phenomenon that discomfort may give rise to the throbbing of radial artery, we develop an objective rating scale innovatively, quantifying the severity of pain by the degree of “lateral instability” of an arterial pulse wave. In attempting to monitor this lateral instability, a sort of ultra-small piezoresistive pressure sensor is fabricated in an area of 0. 4 × 0. 4 $\text{mm}^\text{2}$. With 18 of such sensors, we build a flexible tactile sensing dense-array with a pitch of only 0. 65 mm. Overlying the radial artery perpendicularly to the blood flow direction, the dense-array succeeds in observing the cross-section of a pulse wave. The barycenter of the cross-section of each wave cycle is taken as the feature point to represent its lateral shape and drift. The standard deviation of the barycenters' horizontal coordinates is thereby calculated as the pulsatile perceptual rating scale ( $\text{P}^\text{2}$ RS) to reflect the degree of lateral instability, that is, our scale of pain intensity. Among 86 clinical samples, the pain threshold is 0. 11, which is concluded by a binary classification model based on a support vector machine. In terms of its consistency with previous rating scales, the average correlation coefficient reaches 0. 804 among 43 pain samples.

AAAI Conference 2026 Conference Paper

Anatomical Region-Guided Contrastive Decoding: A Plug-and-Play Strategy for Mitigating Hallucinations in Medical VLMs

  • Xiao Liang
  • Chenxi Liu
  • Zhi Ma
  • Di Wang
  • Bin Jing
  • Quan Wang
  • Yuanyuan Shi

Medical Vision-Language Models (MedVLMs) show immense promise in clinical applicability. However, their reliability is hindered by hallucinations, where models often fail to derive answers from visual evidence, instead relying on learned textual priors. Existing mitigation strategies for MedVLMs have distinct limitations: training-based methods rely on costly expert annotations, limiting scalability, while training-free interventions like contrastive decoding, though data-efficient, apply a global, untargeted correction whose effects in complex real-world clinical settings can be unreliable. To address these challenges, we introduce Anatomical Region-Guided Contrastive Decoding (ARCD), a plug-and-play strategy that mitigates hallucinations by providing targeted, region-specific guidance. Our module leverages an anatomical mask to direct a three-tiered contrastive decoding process. By dynamically re-weighting at the token, attention, and logits levels, it verifiably steers the model's focus onto specified regions, reinforcing anatomical understanding and suppressing factually incorrect outputs. Extensive experiments across diverse datasets, including chest X-ray, CT, brain MRI, and ocular ultrasound, demonstrate our method's effectiveness in improving regional understanding, reducing hallucinations, and enhancing overall diagnostic accuracy.

JBHI Journal 2026 Journal Article

DiffSpkSync: A Muscle Synergy-Guided Spiking Diffusion Model for EMG Signal Generation to Improve Gesture Recognition Performance

  • Kejia Su
  • Bo Wan
  • Jiayang Huang
  • Zhi-Qiang Zhang
  • Junhao Zhang
  • Pengfei Yang
  • Quan Wang

High-density surface electromyography (HD-sEMG) based hand gesture recognition (HGR) has shown great promise for intuitive human-machine interaction. However, the performance of HGR model is often hindered by a scarcity of available training data, especially in the fields of gesture recognition, rehabilitation, and medicine. To address these issues, we propose DiffSpkSync, a novel generative framework that integrates (1) muscle synergy-guided diffusion modeling for physiologically plausible signal reconstruction, (2) spiking neuron-based sparsification to reduce energy cost, and (3) a time-series mixup strategy to preserve local dynamics during augmentation. Experiments on a public Hyser dataset and a self-collected XDHDEMG dataset demonstrate that training gesture classifiers with data augmented by DiffSpkSync consistently improves classification accuracy in both intrasession and intersession scenarios. Comparative results further demonstrate superior performance over representative generative baselines, including VAE, DCGAN, DANN-CRC, and PatchEMG. Furthermore, real-time validation demonstrates that the proposed method achieves an average of 130. 22 ms end-to-end latency and an average of 95. 87% accuracy predictions, supporting their applicability in real-world applications.

YNIMG Journal 2026 Journal Article

Distinct and coordinated contributions of hippocampus and frontal eye field to novelty exploration and revisitation

  • Ziwei Tian
  • Sha Huang
  • Dingyang Liu
  • Zhiquan Yang
  • Sushan Li
  • Bingliang Hu
  • Quan Wang
  • Li Feng

Humans dynamically alternate between exploring novel information and revisiting familiar content, yet the neural mechanisms that separately govern these behaviors remain incompletely understood. In this study, we combined intracranial electrophysiology with eye-tracking to investigate the respective roles of the hippocampus (HP) and frontal eye field (FEF) at the level of individual fixations during a visual working memory task. During novelty exploration, hippocampal high-theta activity increased before fixation onset, suggesting a role in anticipatory attention, while FEF theta oscillations were enhanced following fixation, consistent with sustained attention that facilitated novelty processing. During revisitation, HP might drive gaze through low-theta-gamma phase-amplitude coupling associate with potential memory retrieval process, and subsequently enhanced low-theta oscillations related to potential memory consolidation. Stronger theta-phase synchrony between the FEF and HP was also observed during this revisitation process. Together, these findings reveal distinct yet coordinated FEF-HP dynamics that underpin attention-driven exploration and memory-guided revisitation, providing new electrophysiological evidence for the interaction between attention and memory. Our findings may help inform future research on targeted interventions for attention or memory impairment.

AAAI Conference 2026 Conference Paper

FineRef: Fine-Grained Error Reflection and Correction for Long-Form Generation with Citations

  • Yixing Peng
  • Licheng Zhang
  • Shancheng Fang
  • Yi Liu
  • Peijian Gu
  • Quan Wang

Generating with citations is crucial for trustworthy Large Language Models (LLMs), yet even advanced LLMs often produce mismatched or irrelevant citations. Existing methods over-optimize citation fidelity while overlooking relevance to the user query, which degrades answer quality and robustness in real-world settings with noisy or irrelevant retrieved content. Moreover, the prevailing single-pass paradigm struggles to deliver optimal answers in long-form generation that requiring multiple citations. To address these limitations, we propose FineRef, a framework based on Fine-grained error Reflection, which explicitly teaches the model to self-identify and correct two key citation errors—mismatch and irrelevance—on a per-citation basis. FineRef follows a two-stage training strategy. The first stage instills an “attempt–reflect–correct” behavioral pattern via supervised fine-tuning, using fine-grained and controllable reflection data constructed by specialized lightweight models. An online self-reflective bootstrapping strategy is designed to improve generalization by iteratively enriching training data with verified, self-improving examples. To further enhance the self-reflection and correction capability, the second stage applies process-level reinforcement learning with a multi-dimensional reward scheme that promotes reflection accuracy, answer quality, and correction gain. Experiments on the ALCE benchmark demonstrate that FineRef significantly improves both citation performance and answer accuracy. Our 7B model outperforms GPT-4 by up to 18% in Citation F1 and 4% in EM Recall, while also surpassing the state-of-the-art model across key evaluation metrics. FineRef also exhibits strong generalization and robustness in domain transfer settings and noisy retrieval scenarios.

AAAI Conference 2026 Conference Paper

In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback

  • Mingye Zhu
  • Yi Liu
  • Zheren Fu
  • Quan Wang
  • Yongdong Zhang

Training Large Language Models (LLMs) for chain-of-thought reasoning presents a significant challenge: supervised fine-tuning on a single "golden" rationale hurts generalization as it penalizes equally valid alternatives, whereas reinforcement learning with verifiable rewards struggles with credit assignment and prohibitive computational cost. To tackle these limitations, we introduce InTRO (In-Token Rationality Optimization), a new framework that enables both token-level exploration and self-feedback for accurate and concise reasoning. Instead of directly optimizing an intractable objective over all valid reasoning paths, InTRO leverages correction factors—token-wise importance weights estimated by the information discrepancy between the generative policy and its answer-conditioned counterpart, for informative next-token selection. This approach allows the model to perform token-level exploration and receive self-generated feedback within a single forward pass, ultimately encouraging accurate and concise rationales. Across six math-reasoning benchmarks, InTRO consistently outperforms other baselines, raising solution accuracy by up to 20% relative to the base model. Its chains of thought are also notably more concise, exhibiting reduced verbosity. Beyond this, InTRO enables cross-domain transfer, successfully adapting to out-of-domain reasoning tasks that extend beyond the realm of mathematics, demonstrating robust generalization.

AAAI Conference 2026 Conference Paper

RSVG-ZeroOV: Exploring a Training-Free Framework for Zero-Shot Open-Vocabulary Visual Grounding in Remote Sensing Images

  • Ke Li
  • Di Wang
  • Ting Wang
  • Fuyu Dong
  • Yiming Zhang
  • Luyao Zhang
  • Xiangyu Wang
  • Shaofeng Li

Remote sensing visual grounding (RSVG) aims to localize objects in remote sensing images based on free-form natural language expressions. Existing approaches are typically constrained to closed-set vocabularies, limiting their applicability in open-world scenarios. While recent attempts to leverage generic foundation models for open-vocabulary RSVG, they overly rely on expensive high-quality datasets and time-consuming fine-tuning. To address these limitations, we propose RSVG-ZeroOV, a training-free framework that aims to explore the potential of frozen generic foundation models for zero-shot open-vocabulary RSVG. Specifically, RSVG-ZeroOV comprises three key stages: (i) Overview: We utilize a vision-language model (VLM) to obtain cross-attention maps that capture semantic correlations between text queries and visual regions. (ii) Focus: By leveraging the fine-grained modeling priors of a diffusion model (DM), we fill in gaps in structural and shape information of objects, which are often overlooked by VLM. (iii) Evolve: A simple yet effective attention evolution module is introduced to suppress irrelevant activations, yielding purified segmentation masks over the referred objects. Without cumbersome task-specific training, RSVG-ZeroOV offers an efficient and scalable solution. Extensive experiments demonstrate that the proposed framework consistently outperforms existing weakly-supervised and zero-shot methods.

EAAI Journal 2025 Journal Article

A multiple convolution and bilayer acceleration model for precise and efficient early urban fire detection in complex scenarios

  • Pei Shi
  • Jun Lu
  • Yachen Xu
  • Quan Wang
  • Yonghong Zhang
  • Liang Kuang
  • Deji Chen
  • Guangyan Huang

AI advancement enables earlier and more effective urban fire detection, crucial for slowing fire spread. However, hardware limitations make precise and efficient detection under limited resources a major challenge. Moreover, earlier detection of fire requires the identification of smoke, which further exacerbates the difficulty of detecting algorithms since smoke's inherent low-contrast visual properties produce feature blurring from the surrounding background. In this paper, we propose a novel multiple convolutions and bilayer accelerate (MCBA) model for effective early urban fire detection in terms of precision, lightweight and efficiency, which takes advantage of the mainstream You Only Look Once version 8 (YOLOv8) to training and testing the early fire detection model. In our MCBA model, three optimization techniques have been developed to balance lightweight and precision. First, it designs a new multi-convolution (MC) structure to reduce the size of the original backbone network by avoiding complex or skipping connections. Second, the model includes a novel design of a bilayer accelerate mechanism (BAM) at the neck to minimize the interference of redundant background information in multiple scenarios. Third, we provide a precision compensation strategy (PCS) at the neck to enhance the feature extraction and aggregation capabilities, enabling effective detection of small fire areas. The experiments demonstrate that our proposed MCBA model achieves higher performance in terms of precision and efficiency compared with 17 counterpart detection models. It exhibits superior performance with minimal parameter count and the lowest computational complexity among the compared methods. The model shows strong potential for deployment in early urban fire detection across a variety of real-world scenarios.

JBHI Journal 2025 Journal Article

A Trusted Medical Image Zero-Watermarking Scheme Based on DCNN and Hyperchaotic System

  • Ruotong Xiang
  • Gang Liu
  • Min Dang
  • Quan Wang
  • Rong Pan

The zero-watermarking methods provide a means of lossless, which was adopted to protect medical image copyright requiring high integrity. However, most existing studies have only focused on robustness and there has been little discussion about the analysis and experiment on discriminability. Therefore, this paper proposes a trusted robust zero-watermarking scheme for medical images based on Deep convolution neural network (DCNN) and the hyperchaotic encryption system. Firstly, the medical image is converted into several feature map matrices by the specific convolution layer of DCNN. Then, a stable Gram matrix is obtained by calculating the colinear correlation between different channels in feature map matrices. Finally, the Gram matrixes of the medical image and the feature map matrixes of the watermark image are fused by the trained DCNN to generate the zero-watermark. Meanwhile, we propose two feature evaluation criteria for finding differentiated eigenvalues. The eigenvalue is used as the explicit key to encrypt the generated zero-watermark by Lorenz hyperchaotic encryption, which enhances security and discriminability. The experimental results show that the proposed scheme can resist common image attacks and geometric attacks, and is distinguishable in experiments, being applicable for the copyright protection of medical images.

AAAI Conference 2025 Conference Paper

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

  • Jiaang Li
  • Quan Wang
  • Zhongnan Wang
  • Yongdong Zhang
  • Zhendong Mao

Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and result in a significant forgetting effect in lifelong editing scenarios, where sequential edits are conducted over time. Previous approaches manage sequential edits by freezing original parameters and discretely allocating new parameters for each knowledge update. However, these methods lack robustness to minor input variations due to the discrete mapping between data and parameters. To overcome this challenge, we propose ELDER, a novel approach to create a continuous association between data and adapters. ELDER integrates multiple LoRAs through a router network and is trained to establish a smooth data-adapter association, thereby enhancing the edit robustness and generalization of semantically equivalent inputs. To ensure inputs containing the same knowledge will be processed by the same LoRAs, we design a novel loss to guide the model link LoRA allocations with edit knowledge. Furthermore, we propose a deferral mechanism to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting, outperforming eight baselines while exhibiting strong scalability and preserving LLMs' general abilities on downstream tasks.

AAAI Conference 2025 Conference Paper

FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection

  • Ke Li
  • Di Wang
  • Zhangyuan Hu
  • Shaofeng Li
  • Weiping Ni
  • Lin Zhao
  • Quan Wang

Infrared-visible object detection (IVOD) seeks to harness the complementary information in infrared and visible images, thereby enhancing the performance of detectors in complex environments. However, existing methods often neglect the frequency characteristics of complementary information, such as the abundant high-frequency details in visible images and the valuable low-frequency thermal information in infrared images, thus constraining detection performance. To solve this problem, we introduce a novel Frequency-Driven Feature Decomposition Network for IVOD, called FD2-Net, which effectively captures the unique frequency representations of complementary information across multimodal visual spaces. Specifically, we propose a feature decomposition encoder, wherein the high-frequency unit (HFU) utilizes discrete cosine transform to capture representative high-frequency features, while the low-frequency unit (LFU) employs dynamic receptive fields to model the multi-scale context of diverse objects. Next, we adopt a parameter-free complementary strengths strategy to enhance multimodal features through seamless inter-frequency recoupling. Furthermore, we innovatively design a multimodal reconstruction mechanism that recovers image details lost during feature extraction, further leveraging the complementary information from infrared and visible images to enhance overall representational capacity. Extensive experiments demonstrate that FD2-Net outperforms state-of-the-art (SoTA) models across various IVOD benchmarks, i.e. LLVIP (96.2% mAP), FLIR (82.9% mAP), and M3FD (83.5% mAP).

YNIMG Journal 2024 Journal Article

A comprehensive overview of diffuse correlation spectroscopy: Theoretical framework, recent advances in hardware, analysis, and applications

  • Quan Wang
  • Mingliang Pan
  • Lucas Kreiss
  • Saeed Samaei
  • Stefan A. Carp
  • Johannes D. Johansson
  • Yuanzhe Zhang
  • Melissa Wu

Diffuse correlation spectroscopy (DCS) is a powerful tool for assessing microvascular hemodynamic in deep tissues. Recent advances in sensors, lasers, and deep learning have further boosted the development of new DCS methods. However, newcomers might feel overwhelmed, not only by the already-complex DCS theoretical framework but also by the broad range of component options and system architectures. To facilitate new entry to this exciting field, we present a comprehensive review of DCS hardware architectures (continuous-wave, frequency-domain, and time-domain) and summarize corresponding theoretical models. Further, we discuss new applications of highly integrated silicon single-photon avalanche diode (SPAD) sensors in DCS, compare SPADs with existing sensors, and review other components (lasers, sensors, and correlators), as well as data analysis tools, including deep learning. Potential applications in medical diagnosis are discussed and an outlook for the future directions is provided, to offer effective guidance to embark on DCS research.

AAAI Conference 2024 Conference Paper

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

  • Yihan Chen
  • Benfeng Xu
  • Quan Wang
  • Yi Liu
  • Zhendong Mao

While large language models (LLMs) have exhibited impressive instruction-following capabilities, it is still unclear whether and to what extent they can respond to explicit constraints that might be entailed in various instructions. As a significant aspect of LLM alignment, it is thus important to formulate such a specialized set of instructions as well as investigate the resulting behavior of LLMs. To address this vacancy, we propose a new benchmark CoDI-Eval to systematically and comprehensively evaluate LLMs' responses to instructions with various constraints. We construct a large collection of constraints-attributed instructions as a test suite focused on both generalization and coverage. Specifically, we advocate an instruction diversification process to synthesize diverse forms of constraint expression and also deliberate the candidate task taxonomy with even finer-grained sub-categories. Finally, we automate the entire evaluation process to facilitate further developments. Different from existing studies on controllable text generation, CoDI-Eval extends the scope to the prevalent instruction-following paradigm for the first time. We provide extensive evaluations of representative LLMs (e.g., ChatGPT, Vicuna) on CoDI-Eval, revealing their limitations in following instructions with specific constraints and there is still a significant gap between open-source and commercial closed-source LLMs. We believe this benchmark will facilitate research into improving the controllability of LLMs' responses to instructions. Our data and code are available at https://github.com/Xt-cyh/CoDI-Eval.

NeurIPS Conference 2022 Conference Paper

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

  • Keqiang Sun
  • Shangzhe Wu
  • Zhaoyang Huang
  • Ning Zhang
  • Quan Wang
  • Hongsheng Li

Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e. g. , controlling the shapes, expressions, textures, and poses of the generated face images. However, these methods focus on 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF) that effectively enforces the shape of the generated face to commit to a given 3D Morphable Model (3DMM) mesh. To achieve accurate control over fine-grained 3D face shapes of the synthesized image, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis algorithm. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.

IJCAI Conference 2022 Conference Paper

MFAN: Multi-modal Feature-enhanced Attention Networks for Rumor Detection

  • Jiaqi Zheng
  • Xi Zhang
  • Sanchuan Guo
  • Quan Wang
  • Wenyu Zang
  • Yongdong Zhang

Rumor spreaders are increasingly taking advantage of multimedia content to attract and mislead news consumers on social media. Although recent multimedia rumor detection models have exploited both textual and visual features for classification, they do not integrate the social structure features simultaneously, which have shown promising performance for rumor identification. It is challenging to combine the heterogeneous multi-modal data in consideration of their complex relationships. In this work, we propose a novel Multi-modal Feature-enhanced Attention Networks (MFAN) for rumor detection, which makes the first attempt to integrate textual, visual, and social graph features in one unified framework. Specifically, it considers both the complement and alignment relationships between different modalities to achieve better fusion. Moreover, it takes into account the incomplete links in the social network data due to data collection constraints and proposes to infer hidden links to learn better social graph features. The experimental results show that MFAN can detect rumors effectively and outperform state-of-the-art methods.

AAAI Conference 2021 Conference Paper

Deep Metric Learning with Self-Supervised Ranking

  • Zheren Fu
  • Yan Li
  • Zhendong Mao
  • Quan Wang
  • Yongdong Zhang

Deep metric learning aims to learn a deep embedding space, where similar objects are pushed towards together and different objects are repelled against. Existing approaches typically use inter-class characteristics, e. g. , class-level information or instance-level similarity, to obtain semantic relevance of data points and get a large margin between different classes in the embedding space. However, the intra-class characteristics, e. g. , local manifold structure or relative relationship within the same class, are usually overlooked in the learning process. Hence the data structure cannot be fully exploited and the output embeddings have limitation in retrieval. More importantly, retrieval results lack in a good ranking. This paper presents a novel self-supervised ranking auxiliary framework, which captures intra-class characteristics as well as inter-class characteristics for better metric learning. Our method defines specific transform functions to simulates the local structure change of intra-class in the initial image domain, and formulates a self-supervised learning procedure to fully exploit this property and preserve it in the embedding space. Extensive experiments on three standard benchmarks show that our method significantly improves and outperforms the state-of-the-art methods on the performances of both retrieval and ranking by 2%-4%.

AAAI Conference 2021 Conference Paper

Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction

  • Benfeng Xu
  • Quan Wang
  • Yajuan Lyu
  • Yong Zhu
  • Zhendong Mao

Entities, as the essential elements in relation extraction tasks, exhibit certain structure. In this work, we formulate such structure as distinctive dependencies between mention pairs. We then propose SSAN, which incorporates these structural dependencies within the standard self-attention mechanism and throughout the overall encoding stage. Specifically, we design two alternative transformation modules inside each self-attention building block to produce attentive biases so as to adaptively regularize its attention flow. Our experiments demonstrate the usefulness of the proposed entity structure and the effectiveness of SSAN. It significantly outperforms competitive baselines, achieving new state-of-the-art results on three popular document-level relation extraction datasets. We further provide ablation and visualization to show how the entity structure guides the model for better relation extraction. Our code is publicly available. 1

IJCAI Conference 2018 Conference Paper

Deep Multi-View Concept Learning

  • Cai Xu
  • Ziyu Guan
  • Wei Zhao
  • Yunfei Niu
  • Quan Wang
  • Zhiheng Wang

Multi-view data is common in real-world datasets, where different views describe distinct perspectives. To better summarize the consistent and complementary information in multi-view data, researchers have proposed various multi-view representation learning algorithms, typically based on factorization models. However, most previous methods were focused on shallow factorization models which cannot capture the complex hierarchical information. Although a deep multi-view factorization model has been proposed recently, it fails to explicitly discern consistent and complementary information in multi-view data and does not consider conceptual labels. In this work we present a semi-supervised deep multi-view factorization method, named Deep Multi-view Concept Learning (DMCL). DMCL performs nonnegative factorization of the data hierarchically, and tries to capture semantic structures and explicitly model consistent and complementary information in multi-view data at the highest abstraction level. We develop a block coordinate descent algorithm for DMCL. Experiments conducted on image and document datasets show that DMCL performs well and outperforms baseline methods.

AAAI Conference 2018 Conference Paper

Knowledge Graph Embedding With Iterative Guidance From Soft Rules

  • Shu Guo
  • Quan Wang
  • Lihong Wang
  • Bin Wang
  • Li Guo

Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly bene- ficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https: //github. com/iieir-km/RUGE.

NeurIPS Conference 2018 Conference Paper

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

  • Ye Jia
  • Yu Zhang
  • Ron Weiss
  • Quan Wang
  • Jonathan Shen
  • Fei Ren
  • Zhifeng Chen
  • Patrick Nguyen

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples. We demonstrate that the proposed model is able to transfer the knowledge of speaker variability learned by the discriminatively-trained speaker encoder to the new task, and is able to synthesize natural speech from speakers that were not seen during training. We quantify the importance of training the speaker encoder on a large and diverse speaker set in order to obtain the best generalization performance. Finally, we show that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation.

IJCAI Conference 2015 Conference Paper

Knowledge Base Completion Using Embeddings and Rules

  • Quan Wang
  • Bin Wang
  • Li Guo

Knowledge bases (KBs) are often greatly incomplete, necessitating a demand for KB completion. A promising approach is to embed KBs into latent spaces and make inferences by learning and operating on latent representations. Such embedding models, however, do not make use of any rules during inference and hence have limited accuracy. This paper proposes a novel approach which incorporates rules seamlessly into embedding models for KB completion. It formulates inference as an integer linear programming (ILP) problem, with the objective function generated from embedding models and the constraints translated from rules. Solving the ILP problem results in a number of facts which 1) are the most preferred by the embedding models, and 2) comply with all the rules. By incorporating rules, our approach can greatly reduce the solution space and significantly improve the inference accuracy of embedding models. We further provide a slacking technique to handle noise in KBs, by explicitly modeling the noise with slack variables. Experimental results on two publicly available data sets show that our approach significantly and consistently outperforms state-of-the-art embedding models in KB completion. Moreover, the slacking technique is effective in identifying erroneous facts and ambiguous entities, with a precision higher than 90%.