Author name cluster

Kun Qian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers

2 author rows

EAAI Journal 2026 Journal Article

A lightweight prior-encoding-decoding cascade framework for robust depth completion in robotic grasping of transparent objects using RGB-D sensors

Ling Tong
Kun Qian
Bo Zhou
Fang Fang

Details DOI

EAAI Journal 2026 Journal Article

BBANet: Bilateral biological auditory-inspired neural network for heart sound classification

Yang Tan
Haojie Zhang
Jingwen Xu
Hanhan Wu
Kun Qian
Bin Hu
Yoshiharu Yamamoto
Björn W. Schuller

Details DOI

AAAI Conference 2026 Conference Paper

Bridging the Modality Reliability Gap in Drug-Target Interaction Prediction via a Confidence-aware Multimodal Fusion Framework

Jie Yang
Junxiong Zhang
Kun Qian
Qingyu Yang
Weikai Li
Zhen Cheng

With the rapid advancement of deep learning, drug target interaction (DTI) prediction has seen substantial performance enhancements. However, existing methodologies face a critical, yet unaddressed challenge, i.e., the Modality Reliability Gap. Such a gap arises from the unpredictable variance in the informativeness and reliability of 1D sequence versus 3D structural data across different drug-target pairs, critically limiting model robustness and domain generalization capabilities. To overcome it, we introduce DrugCMF, a novel Drug-Target interaction prediction method via Confidence-aware Multimodal Fusion framework designed specifically to bridge the Modality Reliability Gap. Specifically, the DrugCMF employs a four-stage approach: (1) it extracts rich features by utilizing four pre-trained models to obtain token-level embeddings from both 1D sequences and 3D structures. (2) it preserves modality informativeness by independently learning interaction patterns within each modality through a Token-level Interaction module. (3) it explicitly quantifies the reliability gap by employing a novel confidence estimation mechanism to dynamically learn weights for each modality. (4) it bridges the gap by using these confidence scores to guide a learnable cross-modal fusion module, adaptively fusing information from the most trustworthy source. By methodically addressing the Modality Reliability Gap, DrugCMF significantly outperforms SOTA methods.

PDF Details DOI

JBHI Journal 2026 Journal Article

Can Information Representations Inspired by the Human Auditory Perception Benefit Computer Audition-Based Disease Detection? An Interpretable Comparative Study

Zhihua Wang
Haojie Zhang
Yang Tan
Rui Wang
Kun Qian
Bin Hu
Yoshiharu Yamamoto
Björn W. Schuller

Computer audition-based methods have attracted a great deal of attention in the field of disease detection due to their significant advantages, e. g. , non-invasive and convenient operation. Among them, the introduction of information representations inspired by human auditory perception, e. g. , Mel-frequency transformation, gives it great potential to approach and even exceed the limits of the human auditory system. However, according to previous research, it remains challenging to fairly assess whether information representations inspired by human auditory perception have a significant positive effect on disease detection. Moreover, performance differences among various information representations and their underlying causes are yet to be thoroughly investigated and analyzed. To this end, we propose an interpretable comparative study on information representations inspired by human auditory perception for disease detection. First, the detection accuracy of different information representations are investigated on two sound datasets (a psychological and a physiological disease) based on the classical model and the proposed Temporal-Spatial Multi-Scale Perception Network. Then, the noise robustness of these information representations are compared by introducing Gaussian noise with varying signal-to-noise ratios (SNRs). Finally, by combining the human auditory perception mechanism and explainable AI techniques, we analyze the reasons for performance differences among various information representations from qualitative and quantitative perspectives. Experimental results demonstrate that information representations inspired by human auditory perception can improve the performance of disease detection with statistical significance. Furthermore, Gammatone Frequency Cepstral Coefficients (GFCCs) outperform other information representations by achieving the highest accuracy, particularly under noisy conditions. The interpretable results further reveal the underlying reasons for GFCC's superior performance, highlighting its ability to capture critical auditory features robustly across varying noise levels. These findings emphasize the potential of auditory perception-inspired representations in advancing computer audition-based disease detection systems and provide a solid foundation for future research in this domain.

Details DOI

JBHI Journal 2025 Journal Article

A Review of AIoT-Based Human Activity Recognition: From Application to Technique

Wen Qi
Xiangmin Xu
Kun Qian
Björn W. Schuller
Giancarlo Fortino
Andrea Aliverti

This scoping review paper redefines the Artificial Intelligence-based Internet of Things (AIoT) driven Human Activity Recognition (HAR) field by systematically extrapolating from various application domains to deduce potential techniques and algorithms. We distill a general model with adaptive learning and optimization mechanisms by conducting a detailed analysis of human activity types and utilizing contact or non-contact devices. It presents various system integration mathematical paradigms driven by multimodal data fusion, covering predictions of complex behaviors and redefining valuable methods, devices, and systems for HAR. Additionally, this paper establishes benchmarks for behavior recognition across different application requirements, from simple localized actions to group activities. It summarizes open research directions, including data diversity and volume, computational limitations, interoperability, real-time recognition, data security, and privacy concerns. Finally, we aim to serve as a comprehensive and foundational resource for researchers delving into the complex and burgeoning realm of AIoT-enhanced HAR, providing insights and guidance for future innovations and developments.

Details DOI

JBHI Journal 2025 Journal Article

An On-Board Executable Multi-Feature Transfer-Enhanced Fusion Model for Three-Lead EEG Sensor-Assisted Depression Diagnosis

Fuze Tian
Haojie Zhang
Yang Tan
Lixian Zhu
Lin Shen
Kun Qian
Bin Hu
Björn W. Schuller

The development of affective computing and medical electronic technologies has led to the emergence of Artificial Intelligence (AI)-based methods for the early detection of depression. However, previous studies have often overlooked the necessity for the AI-assisted diagnosis system to be wearable and accessible in practical scenarios for depression recognition. In this work, we present an on-board executable multi-feature transfer-enhanced fusion model for our custom-designed wearable three-lead Electroencephalogram (EEG) sensor, based on EEG data collected from 73 depressed patients and 108 healthy controls. Experimental results show that the proposed model exhibits low-computational complexity (65. 0 K parameters), promising Floating-Point Operations (FLOPs) performance (25. 6 M), real-time processing (1. 5 s/execution), and low power consumption (320. 8 mW). Furthermore, it requires only 202. 0 KB of Random Access Memory (RAM) and 279. 6 KB of Read-Only Memory (ROM) when deployed on the EEG sensor. Despite its low computational and spatial complexity, the model achieves a notable classification accuracy of 95. 2%, specificity of 94. 0%, and sensitivity of 96. 9% under independent test conditions. These results underscore the potential of deploying the model on the wearable three-lead EEG sensor for assisting in the diagnosis of depression.

Details DOI

JBHI Journal 2025 Journal Article

FedKDC: Consensus-Driven Knowledge Distillation for Personalized Federated Learning in EEG-Based Emotion Recognition

Xihang Qiu
Wanyong Qiu
Ye Zhang
Kun Qian
Chun Li
Bin Hu
Björn W. Schuller
Yoshiharu Yamamoto

Federated learning (FL) has gained prominence in electroencephalogram (EEG)-based emotion recognition because of its ability to enable secure collaborative training without centralized data. However, traditional FL faces challenges due to model and data heterogeneity in smart healthcare settings. For example, medical institutions have varying computational resources, which creates a need for personalized local models. Moreover, EEG data from medical institutions typically face data heterogeneity issues stemming from limitations in participant availability, ethical constraints, and cultural differences among subjects, which can slow model convergence and degrade model performance. To address these challenges, we propose FedKDC, a novel FL framework that incorporates clustered knowledge distillation (CKD). This method introduces a consensus-based distributed learning mechanism to facilitate the clustering process. It then enhances the convergence speed through intraclass distillation and reduces the negative impact of heterogeneity through interclass distillation. Additionally, we introduce a DriftGuard mechanism to mitigate client drift, along with an entropy reducer to decrease the entropy of aggregated knowledge. The framework is validated on the SEED, SEED-IV, SEED-FRA, and SEED-GER datasets, demonstrating its effectiveness in scenarios where both the data and the models are heterogeneous. Experimental results show that FedKDC outperforms other FL frameworks in emotion recognition, achieving a maximum average accuracy of 85. 2%, and in convergence efficiency, with faster and more stable convergence.

Details DOI

ICRA Conference 2025 Conference Paper

LiLoc: Lifelong Localization Using Adaptive Submap Joining and Egocentric Factor Graph

Yixin Fang
Yanyan Li 0001
Kun Qian
Federico Tombari
Yue Wang
Gim Hee Lee

This paper proposes a versatile graph-based lifelong localization framework using LiDAR, LiLoc, which enhances its timeliness by maintaining a single central session while improves the accuracy through multi-modal factors between the central and subsidiary sessions. First, an adaptive submap joining strategy is employed to generate prior submaps (keyframes and poses) for the central session, and to provide priors for subsidiaries when constraints are needed for robust localization. Next, a coarse-to-fine pose initialization for subsidiary sessions is performed using vertical recognition and ICP refinement in the global coordinate frame. To elevate the accuracy of subsequent localization, we propose an egocentric factor graph (EFG) module that integrates the IMU preintegration, LiDAR odometry and scan match factors in a joint optimization manner. Specifically, the scan match factors are constructed by a novel propagation model that efficiently distributes the prior constrains as edges to the relevant prior pose nodes, weighted by noises based on keyframe registration errors. Additionally, the framework supports flexible switching between two modes: relocalization (RLM) and incremental localization (ILM) based on the proposed overlap-based mechanism to select or update the prior submaps from central session. The proposed LiLoc is tested on public and custom datasets, demonstrating accurate localization performance against state-of-the-art methods. Our codes will be publicly available on https://github.com/Yixin-F/LiLoc.

Details

TMLR Journal 2025 Journal Article

Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning

Jiacheng Lin
Tian Wang
Kun Qian

We propose Rec-R1, a general reinforcement learning framework that bridges large language models (LLMs) with recommendation systems through closed-loop optimization. Unlike prompting and supervised fine-tuning (SFT), Rec-R1 directly optimizes LLM generation using feedback from a fixed, black-box recommendation model—without relying on synthetic SFT data from proprietary models like GPT-4o. This avoids the substantial cost and effort required for data distillation. To verify the effectiveness of Rec-R1, we evaluate Rec-R1 on three representative tasks: product search, sequential recommendation, and product re-ranking. Experimental results demonstrate that Rec-R1 not only consistently outperforms prompting- and SFT-based methods, but also achieves remarkable gains over strong discriminative baselines, even when used with simple retrievers like BM25. More impressively, Rec-R1 preserves the general-purpose capabilities of the LLM, in contrast to SFT, which often impairs instruction-following and reasoning. These findings suggest Rec-R1 as a promising foundation for continual task-specific adaptation without catastrophic forgetting.

PDF Details

JBHI Journal 2024 Journal Article

Automated Cough Sound Analysis for Detecting Childhood Pneumonia

Roneel V. Sharan
Kun Qian
Yoshiharu Yamamoto

Pneumonia is one of the leading causes of death in children. Prompt diagnosis and treatment can help prevent these deaths, particularly in resource poor regions where deaths due to pneumonia are highest. Clinical symptom-based screening of childhood pneumonia yields excessive false positives, highlighting the necessity for additional rapid diagnostic tests. Cough is a prevalent symptom of acute respiratory illnesses and the sound of a cough can indicate the underlying pathological changes resulting from respiratory infections. In this study, we propose a fully automated approach to evaluate cough sounds to distinguish pneumonia from other acute respiratory diseases in children. The proposed method involves cough sound denoising, cough sound segmentation, and cough sound classification. The denoising algorithm utilizes multi-conditional spectral mapping with a multilayer perceptron network while the segmentation algorithm detects cough sounds directly from the denoised audio waveform. From the segmented cough signal, we extract various handcrafted features and feature embeddings from a pretrained deep learning network. A multilayer perceptron is trained on the combined feature set for detecting pneumonia. The method we propose is evaluated using a dataset comprising cough sounds from 173 children diagnosed with either pneumonia or other acute respiratory diseases. On average, the denoising algorithm improved the signal-to-noise ratio by 44%. Furthermore, a sensitivity and specificity of 91% and 86%, respectively, is achieved in cough segmentation and 82% and 71%, respectively, in detecting childhood pneumonia using cough sounds alone. This demonstrates its potential as a rapid diagnostic tool, such as using smartphone technology.

Details DOI

NeurIPS Conference 2024 Conference Paper

Consent in Crisis: The Rapid Decline of the AI Data Commons

Shayne Longpre
Robert Mahari
Ariel Lee
Campbell Lund
Hamidah Oderinwale
William Brannon
Nayan Saxena
Naana Obeng-Marnu

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AI-specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots. txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5\%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crises in data consent, for both developers and creators. The foreclosure of much of the open web will impact not only commercial AI, but also non-commercial AI and academic research.

PDF Details DOI

JBHI Journal 2024 Journal Article

Fed-MStacking: Heterogeneous Federated Learning With Stacking Misaligned Labels for Abnormal Heart Sound Detection

Wanyong Qiu
Yifan Feng
Yuying Li
Yi Chang
Kun Qian
Bin Hu
Yoshiharu Yamamoto
Björn W. Schuller

Ubiquitous sensing has been widely applied in smart healthcare, providing an opportunity for intelligent heart sound auscultation. However, smart devices contain sensitive information, raising user privacy concerns. To this end, federated learning (FL) has been adopted as an effective solution, enabling decentralised learning without data sharing, thus preserving data privacy in the Internet of Health Things (IoHT). Nevertheless, traditional FL requires the same architectural models to be trained across local clients and global servers, leading to a lack of model heterogeneity and client personalisation. For medical institutions with private data clients, this study proposes Fed-MStacking, a heterogeneous FL framework that incorporates a stacking ensemble learning strategy to support clients in building their own models. The secondary objective of this study is to address scenarios involving local clients with data characterised by inconsistent labelling. Specifically, the local client contains only one case type, and the data cannot be shared within or outside the institution. To train a global multi-class classifier, we aggregate missing class information from all clients at each institution and build meta-data, which then participates in FL training via a meta-learner. We apply the proposed framework to a multi-institutional heart sound database. The experiments utilise random forests (RFs), feedforward neural networks (FNNs), and convolutional neural networks (CNNs) as base classifiers. The results show that the heterogeneous stacking of local models performs better compared to homogeneous stacking.

Details DOI

EAAI Journal 2024 Journal Article

Gaze-assisted visual grounding via knowledge distillation for referred object grasping with under-specified object referring

Zhuoyang Zhang
Kun Qian
Bo Zhou
Fang Fang
Xudong Ma

Details DOI

EAAI Journal 2024 Journal Article

Multi-colour sketch-based image retrieval with an explicable feature embedding

Shuangbu Wang
Yu Xia
Nan Xiang
Kun Qian
Xiaosong Yang
Lihua You
Jianjun Zhang

Details DOI

JBHI Journal 2023 Journal Article

Depression Recognition From EEG Signals Using an Adaptive Channel Fusion Method via Improved Focal Loss

Jian Shen
Yanan Zhang
Huajian Liang
Zeguang Zhao
Kexin Zhu
Kun Qian
Qunxi Dong
Xiaowei Zhang

Depression is a serious and common psychiatric disease characterized by emotional and cognitive dysfunction. In addition, the rates of clinical diagnosis and treatment for depression are low. Therefore, the accurate recognition of depression is important for its effective treatment. Electroencephalogram (EEG) signals, which can objectively reflect the inner states of human brains, are regarded as promising physiological tools that can enable effective and efficient clinical depression diagnosis and recognition. However, one of the challenges regarding EEG-based depression recognition involves sufficiently optimizing the spatial information derived from the multichannel space of EEG signals. Consequently, we propose an adaptive channel fusion method via improved focal loss (FL) functions for depression recognition based on EEG signals to effectively address this challenge. In this method, we propose two improved FL functions that can enhance the separability of hard examples by upweighting their losses as optimization objectives and can optimize the channel weights by a proposed adaptive channel fusion framework. The experimental results obtained on two EEG datasets show that the developed channel fusion method can achieve improved classification performance. The learned channel weights include the individual characteristics of each EEG epoch, which can effectively optimize the spatial information of each EEG epoch via the channel fusion method. In addition, the proposed method performs better than the state-of-the-art channel fusion methods.

Details DOI

AIIM Journal 2023 Journal Article

Multi-task learning framework to predict the status of central venous catheter based on radiographs

Yuhan Wang
Hak Keung Lam
Yujia Xu
Faliang Yin
Kun Qian

Details DOI

JBHI Journal 2022 Journal Article

Capturing Time Dynamics From Speech Using Neural Networks for Surgical Mask Detection

Shuo Liu
Adria Mallol-Ragolta
Tianhao Yan
Kun Qian
Emilia Parada-Cabaleiro
Bin Hu
Bjorn W. Schuller

The importance of detecting whether a person wears a face mask while speaking has tremendously increased since the outbreak of SARS-CoV-2 (COVID-19), as wearing a mask can help to reduce the spread of the virus and mitigate the public health crisis. Besides affecting human speech characteristics related to frequency, face masks cause temporal interferences in speech, altering the pace, rhythm, and pronunciation speed. In this regard, this paper presents two effective neural network models to detect surgical masks from audio. The proposed architectures are both based on Convolutional Neural Networks (CNNs), chosen as an optimal approach for the spatial processing of the audio signals. One architecture applies a Long Short-Term Memory (LSTM) network to model the time-dependencies. Through an additional attention mechanism, the LSTM-based architecture enables the extraction of more salient temporal information. The other architecture (named ConvTx) retrieves the relative position of a sequence through the positional encoder of a transformer module. In order to assess to which extent both architectures can complement each other when modelling temporal dynamics, we also explore the combination of LSTM and Transformers in three hybrid models. Finally, we also investigate whether data augmentation techniques, such as, using transitions between audio frames and considering gender-dependent frameworks might impact the performance of the proposed architectures. Our experimental results show that one of the hybrid models achieves the best performance, surpassing existing state-of-the-art results for the task at hand.

Details DOI

AAAI Conference 2021 Conference Paper

A Student-Teacher Architecture for Dialog Domain Adaptation Under the Meta-Learning Setting

Kun Qian
Wei Wei
Zhou Yu

Numerous new dialog domains are being created every day while collecting data for these domains is extremely costly since it involves human interactions. Therefore, it is essential to develop algorithms that can adapt to different domains efficiently when building data-driven dialog models. Most recent research on domain adaption focuses on giving the model a better initialization, rather than optimizing the adaptation process. We propose an efficient domain adaptive taskoriented dialog system model, which incorporates a metateacher model to emphasize the different impacts between generated tokens with respect to the context. We first train our base dialog model and meta-teacher model adversarially in a meta-learning setting on rich-resource domains. The metateacher learns to quantify the importance of tokens under different contexts across different domains. During adaptation, the meta-teacher guides the dialog model to focus on important tokens in order to achieve better adaptation efficiency. We evaluate our model on two multi-domain datasets, MultiWOZ and Google Schema-Guided Dialogue, and achieve state-of-the-art performance.

PDF Details

JBHI Journal 2021 Journal Article

Can Machine Learning Assist Locating the Excitation of Snore Sound? A Review

Kun Qian
Christoph Janott
Maximilian Schmitt
Zixing Zhang
Clemens Heiser
Werner Hemmert
Yoshiharu Yamamoto
Bjorn W. Schuller

In the past three decades, snoring (affecting more than 30 % adults of the UK population) has been increasingly studied in the transdisciplinary research community involving medicine and engineering. Early work demonstrated that, the snore sound can carry important information about the status of the upper airway, which facilitates the development of non-invasive acoustic based approaches for diagnosing and screening of obstructive sleep apnoea and other sleep disorders. Nonetheless, there are more demands from clinical practice on finding methods to localise the snore sound's excitation rather than only detecting sleep disorders. In order to further the relevant studies and attract more attention, we provide a comprehensive review on the state-of-the-art techniques from machine learning to automatically classify snore sounds. First, we introduce the background and definition of the problem. Second, we illustrate the current work in detail and explain potential applications. Finally, we discuss the limitations and challenges in the snore sound classification task. Overall, our review provides a comprehensive guidance for researchers to contribute to this area.

Details DOI

AAAI Conference 2020 Short Paper

Adabot: Fault-Tolerant Java Decompiler (Student Abstract)

Zhiming Li
Qing Wu
Kun Qian

Reverse Engineering has been an extremely important field in software engineering, it helps us to better understand and analyze the internal architecture and interrealtions of executables. Classical Java reverse engineering task includes disassembly and decompilation. Traditional Abstract Syntax Tree (AST) based disassemblers and decompilers are strictly rule defined and thus highly fault intolerant when bytecode obfuscation were introduced for safety concern. In this work, we view decompilation as a statistical machine translation task and propose a decompilation framework which is fully based on self-attention mechanism. Through better adaption to the linguistic uniqueness of bytecode, our model fully outperforms rule-based models and previous works based on recurrence mechanism.

PDF Details

AAAI Conference 2020 Conference Paper

End-to-End Trainable Non-Collaborative Dialog System

Yu Li
Kun Qian
Weiyan Shi
Zhou Yu

End-to-end task-oriented dialog models have achieved promising performance on collaborative tasks where users willingly coordinate with the system to complete a given task. While in non-collaborative settings, for example, negotiation and persuasion, users and systems do not share a common goal. As a result, compared to collaborate tasks, people use social content to build rapport and trust in these non-collaborative settings in order to advance their goals. To handle social content, we introduce a hierarchical intent annotation scheme, which can be generalized to different non-collaborative dialog tasks. Building upon Transfer- Transfo (Wolf et al. 2019), we propose an end-to-end neural network model to generate diverse coherent responses. Our model utilizes intent and semantic slots as the intermediate sentence representation to guide the generation process. In addition, we design a ﬁlter to select appropriate responses based on whether these intermediate representations ﬁt the designed task and conversation constraints. Our noncollaborative dialog model guides users to complete the task while simultaneously keeps them engaged. We test our approach on our newly proposed ANTISCAM dataset and an existing PERSUASIONFORGOOD dataset. Both automatic and human evaluations suggest that our model outperforms multiple baselines in these two non-collaborative tasks.

PDF Details

JBHI Journal 2020 Journal Article

Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS—The Heart Sounds Shenzhen Corpus

Fengquan Dong
Kun Qian
Zhao Ren
Alice Baird
Xinjian Li
Zhenyu Dai
Bo Dong
Florian Metze

Auscultation of the heart is a widely studied technique, which requires precise hearing from practitioners as a means of distinguishing subtle differences in heart-beat rhythm. This technique is popular due to its non-invasive nature, and can be an early diagnosis aid for a range of cardiac conditions. Machine listening approaches can support this process, monitoring continuously and allowing for a representation of both mild and chronic heart conditions. Despite this potential, relevant databases and benchmark studies are scarce. In this paper, we introduce our publicly accessible database, the Heart Sounds Shenzhen Corpus (HSS), which was first released during the recent INTERSPEECH 2018 ComParE Heart Sound sub-challenge. Additionally, we provide a survey of machine learning work in the area of heart sound recognition, as well as a benchmark for HSS utilising standard acoustic features and machine learning models. At best our support vector machine with Log Mel features achieves 49. 7% unweighted average recall on a three category task (normal, mild, moderate/severe).

Details DOI

AAAI Conference 2020 System Paper

PARTNER: Human-in-the-Loop Entity Name Understanding with Deep Learning

Kun Qian
Poornima Chozhiyath Raman
Yunyao Li
Lucian Popa

Entity name disambiguation is an important task for many text-based AI tasks. Entity names usually have internal semantic structures that are useful for resolving different variations of the same entity. We present, PARTNER, a deep learning-based interactive system for entity name understanding. Powered by effective active learning and weak supervision, PARTNER can learn deep learning-based models for identifying entity name structure with low human effort. PARTNER also allows the user to design complex normalization and variant generation functions without coding skills.

PDF Details

JBHI Journal 2020 Journal Article

Snore-GANs: Improving Automatic Snore Sound Classification With Synthesized Data

Zixing Zhang
Jing Han
Kun Qian
Christoph Janott
Yanan Guo
Bjorn Schuller

One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach based on semi-supervised conditional generative adversarial networks (scGANs), which aims to automatically learn a mapping strategy from a random noise space to original data distribution. The proposed approach has the capability of well synthesizing “realistic” high-dimensional data, while requiring no additional annotation process. To handle the mode collapse problem of GANs, we further introduce an ensemble strategy to enhance the diversity of the generated data. The systematic experiments conducted on a widely used Munich-Passau snore sound corpus demonstrate that the scGANs-based systems can remarkably outperform other classic data augmentation systems, and are also competitive to other recently reported systems for ASSC.

Details DOI

AAAI Conference 2019 Conference Paper

Knowledge Refinement via Rule Selection

Phokion G. Kolaitis
Lucian Popa
Kun Qian

In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size.

PDF Details