Arrow Research search

Author name cluster

Bing Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

71 papers
2 author rows

Possible papers

71

AAAI Conference 2026 Conference Paper

Continual Out-of-Distribution Detection with Analytic Neural Collapse

  • Saleh Momeni
  • Changnan Xiao
  • Bing Liu

Continual learning (CL) aims to enable models to incrementally learn from a sequence of tasks without forgetting previously acquired knowledge. While most prior work focuses on closed-world settings, where all test instances are assumed from the set of learned classes, real-world applications require models to handle both CL and out-of-distribution (OOD) samples. A key insight from recent studies on deep neural networks is the phenomenon of Neural Collapse (NC), which occurs in the terminal phase of training when the loss approaches zero. Under NC, class features collapse to their means, and classifier weights align with these means, enabling effective prototype-based strategies such as nearest class mean, for both classification and OOD detection. However, in CL, catastrophic forgetting (CF) prevents the model from naturally reaching this desirable regime. In this paper, we propose a novel method called Analytic Neural Collapse (AnaNC) that analytically creates the NC properties in the feature space of a frozen pre-trained model with no training, overcoming CF. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in continual OOD detection and learning, highlighting the effectiveness of our method in this challenging scenario.

AAAI Conference 2026 Conference Paper

Negative Entity Suppression for Zero-Shot Captioning with Synthetic Images

  • Zimao Lu
  • Hui Xu
  • Bing Liu
  • Ke Wang

Text-only training provides an attractive approach to address data scarcity challenges in zero-shot image captioning (ZIC), avoiding the expense of collecting paired image-text annotations. However, although these approaches perform well within training domains, they suffer from poor cross-domain generalization, often producing hallucinated content when encountering novel visual environments. Retrieval-based methods attempt to mitigate this limitation by leveraging external knowledge, but they can paradoxically exacerbate hallucination when retrieved captions contain entities irrelevant to the inputs. We introduce the concept of negative entities—objects that appear in generated caption but are absent from the input—and propose Negative Entity Suppression (NES) to tackle this challenge. NES seamlessly integrates three stages: (1) it employs synthetic images to ensure consistent image-to-text retrieval across both training and inference; (2) it filters negative entities from retrieved content to enhance accuracy; and (3) it applies attention-level suppression using identified negative entities to further minimize the impact of hallucination-prone features. Evaluation across multiple benchmarks demonstrates that NES maintains competitive in-domain performance while improving cross-domain transfer and reducing hallucination rates, achieving new state-of-the-art results in ZIC.

NeurIPS Conference 2025 Conference Paper

AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection

  • Saleh Momeni
  • Changnan Xiao
  • Bing Liu

This paper studies the problem of class-incremental learning (CIL), a core setting within continual learning where a model learns a sequence of tasks, each containing a distinct set of classes. Traditional CIL methods, which do not leverage pre-trained models (PTMs), suffer from catastrophic forgetting (CF) due to the need to incrementally learn both feature representations and the classifier. The integration of PTMs into CIL has recently led to efficient approaches that treat the PTM as a fixed feature extractor combined with analytic classifiers, achieving state-of-the-art performance. However, they still face a major limitation: the inability to continually adapt feature representations to best suit the CIL tasks, leading to suboptimal performance. To address this, we propose AnaCP (Analytic Contrastive Projection), a novel method that preserves the efficiency of analytic classifiers while enabling incremental feature adaptation without gradient-based training, thereby eliminating the CF caused by gradient updates. Our experiments show that AnaCP not only outperforms existing baselines but also achieves the accuracy level of joint training, which is regarded as the upper bound of CIL.

AAAI Conference 2025 Conference Paper

Continual Learning Using a Kernel-Based Method Over Foundation Models

  • Saleh Momeni
  • Sahisnu Mazumder
  • Bing Liu

Continual learning (CL) learns a sequence of tasks incrementally. This paper studies the challenging CL setting of class-incremental learning (CIL). CIL has two key challenges: catastrophic forgetting (CF) and inter-task class separation (ICS). Despite numerous proposed methods, these issues remain persistent obstacles. This paper proposes a novel CIL method, called Kernel Linear Discriminant Analysis (KLDA), that can effectively avoid CF and ICS problems. It leverages only the powerful features learned in a foundation model (FM). However, directly using these features proves suboptimal. To address this, KLDA incorporates the Radial Basis Function (RBF) kernel and its Random Fourier Features (RFF) to enhance the feature representations from the FM, leading to improved performance. When a new task arrives, KLDA computes only the mean for each class in the task and updates a shared covariance matrix for all learned classes based on the kernelized features. Classification is performed using Linear Discriminant Analysis. Our empirical evaluation using text and image classification datasets demonstrates that KLDA significantly outperforms baselines. Remarkably, without relying on replay data, KLDA achieves accuracy comparable to joint training of all classes, which is considered the upper bound for CIL performance.

IJCAI Conference 2025 Conference Paper

Counterfactual Knowledge Maintenance for Unsupervised Domain Adaptation

  • Yao Li
  • Yong Zhou
  • Jiaqi Zhao
  • Wen-Liang Du
  • Rui Yao
  • Bing Liu

Traditional unsupervised domain adaptation (UDA) struggles to extract rich semantics due to backbone limitations. Recent large-scale pre-trained visual-language models (VLMs) have shown strong zero-shot learning capabilities in UDA tasks. However, directly using VLMs results in a mixture of semantic and domain-specific information, complicating knowledge transfer. Complex scenes with subtle semantic differences are prone to misclassification, which in turn can result in the loss of features that are crucial for distinguishing between classes. To address these challenges, we propose a novel counterfactual knowledge maintenance UDA framework. Specifically, we employ counterfactual disentanglement to separate the representation of semantic information from domain features, thereby reducing domain bias. Furthermore, to clarify ambiguous visual information specific to classes, we maintain the discriminative knowledge of both visual and textual information. This approach synergistically leverages multimodal information to preserve modality-specific distinguishable features. We conducted extensive experimental evaluations on several public datasets to demonstrate the effectiveness of our method. The source code is available at https: //github. com/LiYaolab/CMKUDA

ICML Conference 2025 Conference Paper

DiMa: Understanding the Hardness of Online Matching Problems via Diffusion Models

  • Boyu Zhang
  • Aocheng Shen
  • Bing Liu
  • Qiankun Zhang 0001
  • Bin Yuan
  • Jing Wang
  • Shenghao Liu
  • Xianjun Deng

We explore the potential of AI-enhanced combinatorial optimization theory, taking online bipartite matching (OBM) as a case study. In the theoretical study of OBM, the hardness corresponds to a performance upper bound of a specific online algorithm or any possible online algorithms. Typically, these upper bounds derive from challenging instances meticulously designed by theoretical computer scientists. Zhang et al. (ICML 2024) recently provide an example demonstrating how reinforcement learning techniques enhance the hardness result of a specific OBM model. Their attempt is inspiring but preliminary. It is unclear whether their methods can be applied to other OBM problems with similar breakthroughs. This paper takes a further step by introducing DiMa, a unified and novel framework that aims at understanding the hardness of OBM problems based on denoising diffusion probabilistic models (DDPMs). DiMa models the process of generating hard instances as denoising steps, and optimizes them by a novel reinforcement learning algorithm, named shortcut policy gradient (SPG). We first examine DiMa on the classic OBM problem by reproducing its known hardest input instance in literature. Further, we apply DiMa to two well-known variants of OBM, for which the exact hardness remains an open problem, and we successfully improve their theoretical state-of-the-art upper bounds.

ICLR Conference 2025 Conference Paper

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

  • Lucas Bandarkar
  • Benjamin Muller
  • Pritish Yuvraj
  • Rui Hou
  • Nayan Singhal
  • Hongjiang Lv
  • Bing Liu

Model merging, such as model souping, is the practice of combining different models with the same architecture together without further training. In this work, we present a model merging methodology that addresses the difficulty of fine-tuning Large Language Models (LLMs) for target tasks in non-English languages, where task-specific data is often unavailable. We focus on mathematical reasoning and without in-language math data, facilitate cross-lingual transfer by composing language and math capabilities. Starting from the same pretrained model, we fine-tune separate "experts" on math instruction data in English and on generic instruction data in the target language. We then replace the top and bottom transformer layers of the math expert directly with layers from the language expert, which consequently enhances math performance in the target language. The resulting merged models outperform the individual experts and other merging methods on the math benchmark, MGSM, by 10% across four major languages where math instruction data is scarce. In addition, this layer swapping is simple, inexpensive, and intuitive, as it is based on an interpretative analysis of the most important parameter changes during the fine-tuning of each expert. The ability to successfully re-compose LLMs for cross-lingual transfer in this manner opens up future possibilities to combine model expertise, create modular solutions, and transfer reasoning capabilities across languages all post hoc.

IJCAI Conference 2025 Conference Paper

Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking

  • Shenglan Li
  • Rui Yao
  • Yong Zhou
  • Hancheng Zhu
  • Kunyang Sun
  • Bing Liu
  • Zhiwen Shao
  • Jiaqi Zhao

To reduce the reliance on large-scale annotations, self-supervised RGB-T tracking approaches have garnered significant attention. However, the omission of the object region by erroneous pseudo-label or the introduction of background noise affects the efficiency of modality fusion, while pseudo-label noise triggered by similar object noise can further affect the tracking performance. In this paper, we propose GDSTrack, a novel approach that introduces dynamic graph fusion and temporal diffusion to address the above challenges in self-supervised RGB-T tracking. GDSTrack dynamically fuses the modalities of neighboring frames, treats them as distractor noise, and leverages the denoising capability of a generative model. Specifically, by constructing an adjacency matrix via an Adjacency Matrix Generator (AMG), the proposed Modality-guided Dynamic Graph Fusion (MDGF) module uses a dynamic adjacency matrix to guide graph attention, focusing on and fusing the object’s coherent regions. Temporal Graph-Informed Diffusion (TGID) models MDGF features from neighboring frames as interference, and thus improving robustness against similar-object noise. Extensive experiments conducted on four public RGB-T tracking datasets demonstrate that GDSTrack outperforms the existing state-of-the-art methods. The source code is available at https: //github. com/LiShenglana/GDSTrack.

AIJ Journal 2025 Journal Article

Open-world continual learning: Unifying novelty detection and continual learning

  • Gyuhak Kim
  • Changnan Xiao
  • Tatsuya Konishi
  • Zixuan Ke
  • Bing Liu

As AI agents are increasingly used in the real open world with unknowns or novelties, they need the ability to (1) recognize objects that (a) they have learned before and (b) detect items that they have never seen or learned, and (2) learn the new items incrementally to become more and more knowledgeable and powerful. (1) is called novelty detection or out-of-distribution (OOD) detection and (2) is called class incremental learning (CIL), which is a setting of continual learning (CL). In existing research, OOD detection and CIL are regarded as two completely different problems. This paper first provides a theoretical proof that good OOD detection for each task within the set of learned tasks (called closed-world OOD detection) is necessary for successful CIL. We show this by decomposing CIL into two sub-problems: within-task prediction (WP) and task-id prediction (TP), and proving that TP is correlated with closed-world OOD detection. The key theoretical result is that regardless of whether WP and OOD detection (or TP) are defined explicitly or implicitly by a CIL algorithm, good WP and good closed-world OOD detection are necessary and sufficient conditions for good CIL, which unifies novelty or OOD detection and continual learning (CIL, in particular). We call this traditional CIL the closed-world CIL as it does not detect future OOD data in the open world. The paper then proves that the theory can be generalized or extended to open-world CIL, which is the proposed open-world continual learning, that can perform CIL in the open world and detect future or open-world OOD data. Based on the theoretical results, new CIL methods are also designed, which outperform strong baselines in CIL accuracy and in continual OOD detection by a large margin.

ICML Conference 2025 Conference Paper

Understanding the Unfairness in Network Quantization

  • Bing Liu
  • Wenjun Miao
  • Boyu Zhang
  • Qiankun Zhang 0001
  • Bin Yuan
  • Jing Wang
  • Shenghao Liu
  • Xianjun Deng

Network quantization, one of the most widely studied model compression methods, effectively quantizes a floating-point model to obtain a fixed-point one with negligible accuracy loss. Although great success was achieved in reducing the model size, it may exacerbate the unfairness in model accuracy across different groups of datasets. This paper considers two widely used algorithms: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), with an attempt to understand how they cause this critical issue. Theoretical analysis with empirical verifications reveals two responsible factors, as well as how they influence a metric of fairness in depth. A comparison between PTQ and QAT is then made, explaining an observation that QAT behaves even worse than PTQ in fairness, although it often preserves a higher accuracy at lower bit-widths in quantization. Finally, the paper finds out that several simple data augmentation methods can be adopted to alleviate the disparate impacts of quantization, based on a further observation that class imbalance produces distinct values of the aforementioned factors among different attribute classes. We experiment on either imbalanced (UTK-Face and FER2013) or balanced (CIFAR-10 and MNIST) datasets using ResNet and VGG models for empirical evaluation.

YNIMG Journal 2024 Journal Article

Associations of quantitative susceptibility mapping with cortical atrophy and brain connectome in Alzheimer's disease: A multi-parametric study

  • Haojie Chen
  • Aocai Yang
  • Weijie Huang
  • Lei Du
  • Bing Liu
  • Kuan Lv
  • Jixin Luan
  • Pianpian Hu

Aberrant susceptibility due to iron level abnormality and brain network disconnections are observed in Alzheimer's disease (AD), with disrupted iron homeostasis hypothesized to be linked to AD pathology and neuronal loss. However, whether associations exist between abnormal quantitative susceptibility mapping (QSM), brain atrophy, and altered brain connectome in AD remains unclear. Based on multi-parametric brain imaging data from 30 AD patients and 26 healthy controls enrolled at the China-Japan Friendship Hospital, we investigated the abnormality of the QSM signal and volumetric measure across 246 brain regions in AD patients. The structural and functional connectomes were constructed based on diffusion MRI tractography and functional connectivity, respectively. The network topology was quantified using graph theory analyses. We identified seven brain regions with both reduced cortical thickness and abnormal QSM (p < 0.05) in AD, including the right superior frontal gyrus, left superior temporal gyrus, right fusiform gyrus, left superior parietal lobule, right superior parietal lobule, left inferior parietal lobule, and left precuneus. Correlations between cortical thickness and network topology computed across patients in the AD group resulted in statistically significant correlations in five of these regions, with higher correlations in functional compared to structural topology. We computed the correlation between network topological metrics, QSM value and cortical thickness across regions at both individual and group-averaged levels, resulting in a measure we call spatial correlations. We found a decrease in the spatial correlation of QSM and the global efficiency of the structural network in AD patients at the individual level. These findings may provide insights into the complex relationships among QSM, brain atrophy, and brain connectome in AD.

TMLR Journal 2024 Journal Article

Continual Learning: Applications and the Road Forward

  • Eli Verwimp
  • Rahaf Aljundi
  • Shai Ben-David
  • Matthias Bethge
  • Andrea Cossu
  • Alexander Gepperth
  • Tyler L. Hayes
  • Eyke Hüllermeier

Continual learning is a subfield of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by examining recent continual learning papers published at four major machine learning conferences, and show that memory-constrained settings dominate the field. Then, we discuss five open problems in machine learning, and even though they might seem unrelated to continual learning at first sight, we show that continual learning will inevitably be part of their solution. These problems are model editing, personalization and specialization, on-device learning, faster (re-)training and reinforcement learning. Finally, by comparing the desiderata from these unsolved problems and the current assumptions in continual learning, we highlight and discuss four future directions for continual learning research. We hope that this work offers an interesting perspective on the future of continual learning, while displaying its potential value and the paths we have to pursue in order to make it successful. This work is the result of the many discussions the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.

AAAI Conference 2024 Conference Paper

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

  • Hao Liu
  • Xin Li
  • Mingming Gong
  • Bing Liu
  • Yunfei Wu
  • Deqiang Jiang
  • Yinsong Liu
  • Xing Sun

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perform well on unregularized table cases distracted by not only complicated inner structure but also exterior capture distortion. In this paper, we raise it as Complex TSR problem, where the performance degeneration of existing methods is attributable to their inefficient component usage and redundant post-processing. To mitigate it, we shift our perspective from table component extraction towards the efficient multiple components leverage, which awaits further exploration in the field. Specifically, we propose a seminal method, termed GrabTab, equipped with newly proposed Component Deliberator, to handle various types of tables in a unified framework. Thanks to its progressive deliberation mechanism, our GrabTab can flexibly accommodate to most complex tables with reasonable components selected but without complicated post-processing involved. Quantitative experimental results on public benchmarks demonstrate that our method significantly outperforms the state-of-the-arts, especially under more challenging scenes.

ICML Conference 2024 Conference Paper

Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

  • Yang Chen
  • Cong Fang 0001
  • Zhouchen Lin
  • Bing Liu

Foundation Models (FMs) have demonstrated remarkable insights into the relational dynamics of the world, leading to the crucial question: how do these models acquire an understanding of world hybrid relations? Traditional statistical learning, particularly for prediction problems, may overlook the rich and inherently structured information from the data, especially regarding the relationships between objects. We introduce a mathematical model that formalizes relational learning as hypergraph recovery to study pre-training of FMs. In our framework, the world is represented as a hypergraph, with data abstracted as random samples from hyperedges. We theoretically examine the feasibility of a Pre-Trained Model (PTM) to recover this hypergraph and analyze the data efficiency in a minimax near-optimal style. By integrating rich graph theories into the realm of PTMs, our mathematical framework offers powerful tools for an in-depth understanding of pre-training from a unique perspective and can be used under various scenarios. As an example, we extend the framework to entity alignment in multimodal learning.

NeurIPS Conference 2024 Conference Paper

Replay-and-Forget-Free Graph Class-Incremental Learning: A Task Profiling and Prompting Approach

  • Chaoxi Niu
  • Guansong Pang
  • Ling Chen
  • Bing Liu

Class-incremental learning (CIL) aims to continually learn a sequence of tasks, with each task consisting of a set of unique classes. Graph CIL (GCIL) follows the same setting but needs to deal with graph tasks (e. g. , node classification in a graph). The key characteristic of CIL lies in the absence of task identifiers (IDs) during inference, which causes a significant challenge in separating classes from different tasks (i. e. , inter-task class separation). Being able to accurately predict the task IDs can help address this issue, but it is a challenging problem. In this paper, we show theoretically that accurate task ID prediction on graph data can be achieved by a Laplacian smoothing-based graph task profiling approach, in which each graph task is modeled by a task prototype based on Laplacian smoothing over the graph. It guarantees that the task prototypes of the same graph task are nearly the same with a large smoothing step, while those of different tasks are distinct due to differences in graph structure and node attributes. Further, to avoid the catastrophic forgetting of the knowledge learned in previous graph tasks, we propose a novel graph prompting approach for GCIL which learns a small discriminative graph prompt for each task, essentially resulting in a separate classification model for each task. The prompt learning requires the training of a single graph neural network (GNN) only once on the first task, and no data replay is required thereafter, thereby obtaining a GCIL model being both replay-free and forget-free. Extensive experiments on four GCIL benchmarks show that i) our task prototype-based method can achieve 100% task ID prediction accuracy on all four datasets, ii) our GCIL model significantly outperforms state-of-the-art competing methods by at least 18% in average CIL accuracy, and iii) our model is fully free of forgetting on the four datasets.

YNICL Journal 2024 Journal Article

Right superior frontal gyrus: A potential neuroimaging biomarker for predicting short-term efficacy in schizophrenia

  • Yongfeng Yang
  • Xueyan Jin
  • Yongjiang Xue
  • Xue Li
  • Yi Chen
  • Ning Kang
  • Wei Yan
  • Peng Li

Antipsychotic drug treatment for schizophrenia (SZ) can alter brain structure and function, but it is unclear if specific regional changes are associated with treatment outcome. Therefore, we examined the effects of antipsychotic drug treatment on regional grey matter (GM) density, white matter (WM) density, and functional connectivity (FC) as well as associations between regional changes and treatment efficacy. SZ patients (n = 163) and health controls (HCs) (n = 131) were examined by structural magnetic resonance imaging (sMRI) at baseline, and a subset of SZ patients (n = 77) were re-examined after 8 weeks of second-generation antipsychotic treatment to assess changes in regional GM and WM density. In addition, 88 SZ patients and 81 HCs were examined by resting-state functional MRI (rs-fMRI) at baseline and the patients were re-examined post-treatment to examine FC changes. The Positive and Negative Syndrome Scale (PANSS) and MATRICS Consensus Cognitive Battery (MCCB) were applied to measure psychiatric symptoms and cognitive impairments in SZ. SZ patients were then stratified into response and non-response groups according to PANSS score change (≥50 % decrease or <50 % decrease, respectively). The GM density of the right cingulate gyrus, WM density of the right superior frontal gyrus (SFG) plus 5 other WM tracts were reduced in the response group compared to the non-response group. The FC values between the right anterior cingulate and paracingulate gyrus and left thalamus were reduced in the entire SZ group (n = 88) after treatment, while FC between the right inferior temporal gyrus (ITG) and right medial superior frontal gyrus (SFGmed) was increased in the response group. There were no significant changes in regional FC among the non-response group after treatment and no correlations with symptom or cognition test scores. These findings suggest that the right SFG is a critical target of antipsychotic drugs and that WM density and FC alterations within this region could be used as potential indicators in predicting the treatment outcome of antipsychotics of SZ.

TIST Journal 2023 Journal Article

Attention-guided Adversarial Attack for Video Object Segmentation

  • Rui Yao
  • Ying Chen
  • Yong Zhou
  • Fuyuan Hu
  • Jiaqi Zhao
  • Bing Liu
  • Zhiwen Shao

Video Object Segmentation (VOS) methods have made many breakthroughs with the help of the continuous development and advancement of deep learning. However, the deep learning model is vulnerable to malicious adversarial attacks, which mislead the model to make wrong decisions by adding adversarial perturbation that humans cannot perceive to the input image. Threats to deep learning models remind us that video object segmentation methods are also vulnerable to attacks, thereby threatening their security. Therefore, we study adversarial attacks on the VOS task to better identify the vulnerabilities of the VOS method, which in turn provides an opportunity to improve its robustness. In this paper, we propose an attention-guided adversarial attack method, which uses spatial attention blocks to capture features with global dependencies to construct correlations between consecutive video frames, and performs multipath aggregation to effectively integrate spatial-temporal perturbation, thereby guiding the deconvolution network to generate adversarial examples with strong attack capability. Specifically, the class loss function is designed to enable the deconvolution network to better activate noise in other regions and suppress the activation related to the object class based on the enhanced feature map of the object class. At the same time, attentional feature loss is designed to enhance the transferability against attack. The experimental results on the DAVIS dataset show that the proposed attention-guided adversarial attack method can significantly reduce the segmentation accuracy of OSVOS, and the J & F mean on DAVIS 2016 can reach 73.6% drop rate. The generated adversarial examples are also highly transferable to other video object segmentation models.

IJCAI Conference 2023 Conference Paper

Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?

  • Bing Liu
  • Wei Luo
  • Gang Li
  • Jing Huang
  • Bo Yang

As deep learning gains popularity in modelling dynamical systems, we expose an underappreciated misunderstanding relevant to modelling dynamics on networks. Strongly influenced by graph neural networks, latent vertex embeddings are naturally adopted in many neural dynamical network models. However, we show that embeddings tend to induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. Recognising that previous studies narrowly focus on short-term predictions during the transient phase of a flow, we propose three tests for correct long-term behaviour, and illustrate how an embedding-based dynamical model fails these tests, and analyse the causes, particularly through the lens of topological conjugacy. In doing so, we show that the difficulties can be avoided by not using embedding. We propose a simple embedding-free alternative based on parametrising two additive vector-field components. Through extensive experiments, we verify that the proposed model can reliably recover a broad class of dynamics on different network topologies from time series data.

YNIMG Journal 2023 Journal Article

Evaluation of whole-brain oxygen metabolism in Alzheimer's disease using QSM and quantitative BOLD

  • Aocai Yang
  • Hangwei Zhuang
  • Lei Du
  • Bing Liu
  • Kuan Lv
  • Jixin Luan
  • Pianpian Hu
  • Feng Chen

OBJECTIVE: ) perturbation in Alzheimer's disease (AD) and investigate the relationship between regional cerebral oxygen metabolism and global cognition. METHODS: analyses were performed. The associations between these measures in substructures of deep brain gray matter and MMSE scores were assessed. RESULTS: values in the bilateral hippocampus positively correlated with the MMSE score. CONCLUSION: in the hippocampus may be a useful tool for monitoring cognitive impairment.

AAAI Conference 2023 Conference Paper

Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification

  • Pengyu Xu
  • Lin Xiao
  • Bing Liu
  • Sijin Lu
  • Liping Jing
  • Jian Yu

Multi-label text classification (MLTC) involves tagging a document with its most relevant subset of labels from a label set. In real applications, labels usually follow a long-tailed distribution, where most labels (called as tail-label) only contain a small number of documents and limit the performance of MLTC. To facilitate this low-resource problem, researchers introduced a simple but effective strategy, data augmentation (DA). However, most existing DA approaches struggle in multi-label settings. The main reason is that the augmented documents for one label may inevitably influence the other co-occurring labels and further exaggerate the long-tailed problem. To mitigate this issue, we propose a new pair-level augmentation framework for MLTC, called Label-Specific Feature Augmentation (LSFA), which merely augments positive feature-label pairs for the tail-labels. LSFA contains two main parts. The first is for label-specific document representation learning in the high-level latent space, the second is for augmenting tail-label features in latent space by transferring the documents second-order statistics (intra-class semantic variations) from head labels to tail labels. At last, we design a new loss function for adjusting classifiers based on augmented datasets. The whole learning procedure can be effectively trained. Comprehensive experiments on benchmark datasets have shown that the proposed LSFA outperforms the state-of-the-art counterparts.

NeurIPS Conference 2022 Conference Paper

A Theoretical Study on Solving Continual Learning

  • Gyuhak Kim
  • Changnan Xiao
  • Tatsuya Konishi
  • Zixuan Ke
  • Bing Liu

Continual learning (CL) learns a sequence of tasks incrementally. There are two popular CL settings, class incremental learning (CIL) and task incremental learning (TIL). A major challenge of CL is catastrophic forgetting (CF). While a number of techniques are already available to effectively overcome CF for TIL, CIL remains to be highly challenging. So far, little theoretical study has been done to provide a principled guidance on how to solve the CIL problem. This paper performs such a study. It first shows that probabilistically, the CIL problem can be decomposed into two sub-problems: Within-task Prediction (WP) and Task-id Prediction (TP). It further proves that TP is correlated with out-of-distribution (OOD) detection, which connects CIL and OOD detection. The key conclusion of this study is that regardless of whether WP and TP or OOD detection are defined explicitly or implicitly by a CIL algorithm, good WP and good TP or OOD detection are necessary and sufficient for good CIL performances. Additionally, TIL is simply WP. Based on the theoretical result, new CIL methods are also designed, which outperform strong baselines in both CIL and TIL settings by a large margin.

YNICL Journal 2022 Journal Article

Abnormal patterns of regional homogeneity and functional connectivity across the adolescent first-episode, adult first-episode and adult chronic schizophrenia

  • Yongfeng Yang
  • Yuqing Sun
  • Yuliang Zhang
  • Xueyan Jin
  • Zheng Li
  • Minli Ding
  • Han Shi
  • Qing Liu

Functional deficits in schizophrenia (SZ) are observed prior to the onset of psychosis and differ at different stages of SZ. However, there is a paucity of studies focused on adolescent first-episode SZ (AOS), adult first-episode SZ (AFES), and adult chronic SZ (CHSZ). In this study, we investigated regional activity and corresponding functional connectivity alterations that have aimed to compare the three disease stages simultaneously. The subjects comprised 49 patients with AOS, 57 patients with AFES, 51 patients with CHSZ, 41 adolescent healthy controls, and 138 adult healthy controls. We compared regional homogeneity (ReHo) between patients at each disease stage with matched healthy controls. We focused on the shared brain regions that showed significant differences between SZ patients at the three different disease stages and healthy controls. Further analysis was conducted to explore whether the patterns of the whole brain functional connectivity alterations were similar. The putamen and medial frontal gyrus (MFG) showed consistently abnormal patterns in AOS, AFES, and CHSZ. Commonly decreased ReHo values in the MFG and increased ReHo values in the bilateral putamen were found in AOS, AFES, and CHSZ. Functional connectivity of MFG remained common abnormality in different SZ stage. In conclusion, ReHo abnormalities in the MFG and the putamen may be common abnormal patterns of brain function in the three different stages of SZ. The vmPFC-dlPFC FC abnormality common occurs in adolescence and adulthood.. This study may provide a more comprehensive understanding of the neurodevelopmental abnormality across the AOS, AFES, and CHSZ.

AAAI Conference 2022 Conference Paper

Adaptive Orthogonal Projection for Batch and Online Continual Learning

  • Yiduo Guo
  • Wenpeng Hu
  • Dongyan Zhao
  • Bing Liu

Catastrophic forgetting is a key obstacle to continual learning. One of the state-of-the-art approaches is orthogonal projection. The idea of this approach is to learn each task by updating the network parameters or weights only in the direction orthogonal to the subspace spanned by all previous task inputs. This ensures no interference with tasks that have been learned. The system OWM that uses the idea performs very well against other state-of-the-art systems. In this paper, we first discuss an issue that we discovered in the mathematical derivation of this approach and then propose a novel method, called AOP (Adaptive Orthogonal Projection), to resolve it, which results in significant accuracy gains in empirical evaluations in both the batch and online continual learning settings without saving any previous training data as in replay-based methods.

EAAI Journal 2022 Journal Article

Correcting biased value estimation in mixing value-based multi-agent reinforcement learning by multiple choice learning

  • Bing Liu
  • Yuxuan Xie
  • Lei Feng
  • Ping Fu

Multi-agent reinforcement learning (MARL) has become more and more popular over the past decades, and many value-based MARL methods are proposed in the past few years. Neural networks play important roles in these methods and are used to predict the value of the state–action pair, i. e. Q-value and actions of agents are chosen based on this. However the inaccurate prediction of the neural network leads to the biased Q-value estimation, which will cause inefficient usage of the experience data and poor performance. Unlike ensemble methods that just reduce the variance of predictions, multiple choice learning (MCL) methods exploit the cooperation among all the candidate models. This paper corrects the biased Q-value by exploiting the collaboration between the ensemble model and MARL to obtain a stabler and preciser Q-value estimator. In this paper, a new MARL method called Multiple Choice QMIX is developed to address the biased Q-value issue, which also extends the application scenarios of MCL methods. Specifically, we propose a voting network to learn the confidence level of each estimator and thus can provide the best prediction by combining their results. And a voting hindsight loss is proposed to encourage the voting network to overcome the overestimation of the Q-value. We also conduct experiments on four challenging tasks of the StarCraft II micromanagement benchmark. Experiment results show that our method obtains a faster convergence rate and stabler performance in multi-agent tasks.

AAAI Conference 2022 Conference Paper

Ensemble Semi-supervised Entity Alignment via Cycle-Teaching

  • Kexuan Xin
  • Zequn Sun
  • Wen Hua
  • Bing Liu
  • Wei Hu
  • Jianfeng Qu
  • Xiaofang Zhou

Entity alignment is to find identical entities in different knowledge graphs. Although embedding-based entity alignment has recently achieved remarkable progress, training data insufficiency remains a critical challenge. Conventional semisupervised methods also suffer from the incorrect entity alignment in newly proposed training data. To resolve these issues, we design an iterative cycle-teaching framework for semisupervised entity alignment. The key idea is to train multiple entity alignment models (called aligners) simultaneously and let each aligner iteratively teach its successor the proposed new entity alignment. We propose a diversity-aware alignment selection method to choose reliable entity alignment for each aligner. We also design a conflict resolution mechanism to resolve the alignment conflict when combining the new alignment of an aligner and that from its teacher. Besides, considering the influence of cycle-teaching order, we elaborately design a strategy to arrange the optimal order that can maximize the overall performance of multiple aligners. The cycle-teaching process can break the limitations of each model’s learning capability and reduce the noise in new training data, leading to improved performance. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed cycle-teaching framework, which significantly outperforms the state-of-the-art models when the training data is insufficient and the new entity alignment has much noise.

AAAI Conference 2022 Conference Paper

Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP

  • Sepideh Esmaeilpour
  • Bing Liu
  • Eric Robertson
  • Lei Shu

In an out-of-distribution (OOD) detection problem, samples of known classes (also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i. e. , they belong to some unknown or OOD classes). This paper studies the problem of zero-shot out-of-distribution (OOD) detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel and yet simple method (called ZOC) to solve the problem. ZOC builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained language-vision model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot OOD detection. Experimental results on 5 benchmark datasets for OOD detection demonstrate that ZOC outperforms the baselines by a large margin.

NeurIPS Conference 2021 Conference Paper

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

  • Zixuan Ke
  • Bing Liu
  • Nianzu Ma
  • Hu Xu
  • Lei Shu

Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR

NeurIPS Conference 2021 Conference Paper

BNS: Building Network Structures Dynamically for Continual Learning

  • Qi Qin
  • Wenpeng Hu
  • Han Peng
  • Dongyan Zhao
  • Bing Liu

Continual learning (CL) of a sequence of tasks is often accompanied with the catastrophic forgetting(CF) problem. Existing research has achieved remarkable results in overcoming CF, especially for task continual learning. However, limited work has been done to achieve another important goal of CL, knowledge transfer. In this paper, we propose a technique (called BNS) to do both. The novelty of BNS is that it dynamically builds a network to learn each new task to overcome CF and to transfer knowledge across tasks at the same time. Experimental results show that when the tasks are different (with little shared knowledge), BNS can already outperform the state-of-the-art baselines. When the tasks are similar and have shared knowledge, BNS outperforms the baselines substantially by a large margin due to its knowledge transfer capability.

AAAI Conference 2021 Conference Paper

Continual Learning by Using Information of Each Class Holistically

  • Wenpeng Hu
  • Qi Qin
  • Mengyu Wang
  • Jinwen Ma
  • Bing Liu

Continual learning (CL) incrementally learns a sequence of tasks while solving the catastrophic forgetting (CF) problem. Existing methods mainly try to deal with CF directly. In this paper, we propose to avoid CF by considering the features of each class holistically rather than only the discriminative information for classifying the classes seen so far. This latter approach is prone to CF because the discriminative information for old classes may not be sufficiently discriminative for the new class to be learned. Consequently, in learning each new task, the network parameters for previous tasks have to be revised, which causes CF. With the holistic consideration, after adding new tasks, the system can still do well for previous tasks. The proposed technique is called Per-class Continual Learning (PCL). PCL has two key novelties. (1) It proposes a one-class learning based technique for CL, which considers features of each class holistically and represents a new approach to solving the CL problem. (2) It proposes a method to extract discriminative information after training to further improve the accuracy. Empirical evaluation shows that PCL markedly outperforms the state-of-the-art baselines for one or more classes per task. More tasks also result in more gains.

AAAI Conference 2021 Conference Paper

Lifelong and Continual Learning Dialogue Systems: Learning during Conversation

  • Bing Liu
  • Sahisnu Mazumder

Dialogue systems, also called chatbots, are now used in a wide range of applications. However, they still have some major weaknesses. One key weakness is that they are typically trained from manually-labeled data and/or written with handcrafted rules, and their knowledge bases (KBs) are also compiled by human experts. Due to the huge amount of manual effort involved, they are difficult to scale and also tend to produce many errors ought to their limited ability to understand natural language and the limited knowledge in their KBs. Thus, the level of user satisfactory is often low. In this paper, we propose to dramatically improve the situation by endowing the chatbots the ability to continually learn (1) new world knowledge, (2) new language expressions to ground them to actions, and (3) new conversational skills, during conversation by themselves so that as they chat more and more with users, they become more and more knowledgeable and are better and better able to understand diverse natural language expressions and to improve their conversational skills.

YNICL Journal 2021 Journal Article

Multisite schizophrenia classification by integrating structural magnetic resonance imaging data with polygenic risk score

  • Ke Hu
  • Meng Wang
  • Yong Liu
  • Hao Yan
  • Ming Song
  • Jun Chen
  • Yunchun Chen
  • Huaning Wang

Previous brain structural magnetic resonance imaging studies reported that patients with schizophrenia have brain structural abnormalities, which have been used to discriminate schizophrenia patients from normal controls. However, most existing studies identified schizophrenia patients at a single site, and the genetic features closely associated with highly heritable schizophrenia were not considered. In this study, we performed standardized feature extraction on brain structural magnetic resonance images and on genetic data to separate schizophrenia patients from normal controls. A total of 1010 participants, 508 schizophrenia patients and 502 normal controls, were recruited from 8 independent sites across China. Classification experiments were carried out using different machine learning methods and input features. We tested a support vector machine, logistic regression, and an ensemble learning strategy using 3 feature sets of interest: (1) imaging features: gray matter volume, (2) genetic features: polygenic risk scores, and (3) a fusion of imaging features and genetic features. The performance was assessed by leave-one-site-out cross-validation. Finally, some important brain and genetic features were identified. We found that the models with both imaging and genetic features as input performed better than models with either alone. The average accuracy of the classification models with the best performance in the cross-validation was 71.6%. The genetic feature that measured the cumulative risk of the genetic variants most associated with schizophrenia contributed the most to the classification. Our work took the first step toward considering both structural brain alterations and genome-wide genetic factors in a large-scale multisite schizophrenia classification. Our findings may provide insight into the underlying pathophysiology and risk mechanisms of schizophrenia.

AAAI Conference 2021 Conference Paper

Predictive Adversarial Learning from Positive and Unlabeled Data

  • Wenpeng Hu
  • Ran Le
  • Bing Liu
  • Feng Ji
  • Jinwen Ma
  • Dongyan Zhao
  • Rui Yan

This paper studies learning from positive and unlabeled examples, known as PU learning. It proposes a novel PU learning method called Predictive Adversarial Networks (PAN) based on GAN (Generative Adversarial Networks). GAN learns a generator to generate data (e. g. , images) to fool a discriminator which tries to determine whether the generated data belong to a (positive) training class. PU learning can be casted as trying to identify (not generate) likely positive instances from the unlabeled set to fool a discriminator that determines whether the identified likely positive instances from the unlabeled set are indeed positive. However, directly applying GAN is problematic because GAN focuses on only the positive data. The resulting PU learning method will have high precision but low recall. We propose a new objective function based on KLdivergence. Evaluation using both image and text data shows that PAN outperforms state-of-the-art PU learning methods and also a direct adaptation of GAN for PU learning.

NeurIPS Conference 2020 Conference Paper

Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks

  • Zixuan Ke
  • Bing Liu
  • Xingchang Huang

Existing research on continual learning of a sequence of tasks focused on dealing with catastrophic forgetting, where the tasks are assumed to be dissimilar and have little shared knowledge. Some work has also been done to transfer previously learned knowledge to the new task when the tasks are similar and have shared knowledge. %However, in the most general case, a CL system not only should have the above two capabilities, but also the \textit{backward knowledge transfer} capability so that future tasks may help improve the past models whenever possible. To the best of our knowledge, no technique has been proposed to learn a sequence of mixed similar and dissimilar tasks that can deal with forgetting and also transfer knowledge forward and backward. This paper proposes such a technique to learn both types of tasks in the same network. For dissimilar tasks, the algorithm focuses on dealing with forgetting, and for similar tasks, the algorithm focuses on selectively transferring the knowledge learned from some similar previous tasks to improve the new task learning. Additionally, the algorithm automatically detects whether a new task is similar to any previous tasks. Empirical evaluation using sequences of mixed tasks demonstrates the effectiveness of the proposed model.

NeurIPS Conference 2020 Conference Paper

HRN: A Holistic Approach to One Class Learning

  • Wenpeng Hu
  • Mengyu Wang
  • Qi Qin
  • Jinwen Ma
  • Bing Liu

Existing neural network based one-class learning methods mainly use various forms of auto-encoders or GAN style adversarial training to learn a latent representation of the given one class of data. This paper proposes an entirely different approach based on a novel regularization, called holistic regularization (or H-regularization), which enables the system to consider the data holistically, not to produce a model that biases towards some features. Combined with a proposed 2-norm instance-level data normalization, we obtain an effective one-class learning method, called HRN. To our knowledge, the proposed regularization and the normalization method have not been reported before. Experimental evaluation using both benchmark image classification and traditional anomaly detection datasets show that HRN markedly outperforms the state-of-the-art existing deep/non-deep learning models.

AAAI Conference 2020 Conference Paper

Learning on the Job: Online Lifelong and Continual Learning

  • Bing Liu

One of the hallmarks of the human intelligence is the ability to learn continuously, accumulate the knowledge learned in the past and use the knowledge to help learn more and learn better. It is hard to imagine a truly intelligent system without this capability. This type of learning differs significantly than the classic machine learning (ML) paradigm of isolated single-task learning. Although there is already research on learning a sequence of tasks incrementally under the names of lifelong learning or continual learning, they still follow the traditional two-phase separate training and testing paradigm in learning each task. The tasks are also given by the user. This paper adds on-the-job learning to the mix to emphasize the need to learn during application (thus online) after the model has been deployed, which traditional ML cannot do. It aims to leverage the learned knowledge to discover new tasks, interact with humans and the environment, make inferences, and incrementally learn the new tasks on the fly during applications in a self-supervised and interactive manner. This is analogous to human on-the-job learning after formal training. We use chatbots and self-driving cars as examples to discuss the need, some initial work, and key challenges and opportunities in building this capability.

YNICL Journal 2019 Journal Article

Characterization of white matter changes along fibers by automated fiber quantification in the early stages of Alzheimer's disease

  • Xin Zhang
  • Yu Sun
  • Weiping Li
  • Bing Liu
  • Wenbo Wu
  • Hui Zhao
  • Renyuan Liu
  • Yue Zhang

Brain white matter fiber bundles in patients with mild cognitive impairment (MCI) and Alzheimer's disease (AD) have abnormalities not usually seen in unaffected subjects. Ideal algorithm of the localization-specific properties in white matter integrity might reveal the changes of tissue properties varying along each tract, while previous studies only detected the mean DTI parameters of each fiber. The aim of this study was to investigate whether these abnormalities of nerve fiber tracts are localized to specific regions of the tracts or spread throughout and to analyze which of the examined fiber tracts are involved in the early stages of Alzheimer's disease. In this study, we utilized VBA, TBSS as well as AFQ together to comprehensively investigate the white matter fiber impairment on 25 CE patients, 29 MCI patients and 34 normal control (NC) subjects. Two tract profiles, fractional anisotropy (FA) and mean diffusivity (MD), were extracted to evaluate the white matter integrity at 100 locations along each of 20 fiber tracts and then we validated the results with 27 CE patients, 21 MCI patients and 22 NC from the ADNI cohort. Also, we compare the AFQ with VBA and TBSS in our cohort. In comparison with NC, AD patients showed widespread FA reduction in 25% (5 /20) and MD increase in 65%(13/20) of the examined fiber tracts. The MCI patients showed a regional FA reduction in 5% (1/20) of the examined fiber tracts (right cingulum cingulate) and MD increase in 5%(1/20) of the examined fiber tracts (left arcuate fasciculus). Among these changed tracts, only the right cingulum cingulate showed widespread disruption of myelin or/and fiber axons in MCI and aggravated deterioration in AD, findings supported by FA/MD changes both by the mean and FA changes by point wise methods and TBSS. And the AFQ findings from ADNI cohort showed some similarity with our cohort, especially in the pointwise comparison of MD profiles between AD vs NC. Furthermore, the pattern of white matter abnormalities was different across neuronal fiber tracts; for example, the MCI and AD patients showed similar FA reduction in the middle part of the right cingulum cingulate, and the anterior part were not damaged. However, the left arcuate fasciculus showed MD elevation located at the temporal part of the fibers in the MCI patients and expanding to the temporal and middle part of the fibers in AD patients. So, the AFQ may be an alternative complementary method of VBA and TBSS, and may provide new insights into white matter degeneration in MCI and its association with AD.

IJCAI Conference 2019 Conference Paper

GSN: A Graph-Structured Network for Multi-Party Dialogues

  • Wenpeng Hu
  • Zhangming Chan
  • Bing Liu
  • Dongyan Zhao
  • Jinwen Ma
  • Rui Yan

Existing neural models for dialogue response generation assume that utterances are sequentially organized. However, many real-world dialogues involve multiple interlocutors (i. e. , multi-party dialogues), where the assumption does not hold as utterances from different interlocutors can occur ``in parallel. '' This paper generalizes existing sequence-based models to a Graph-Structured neural Network (GSN) for dialogue modeling. The core of GSN is a graph-based encoder that can model the information flow along the graph-structured dialogues (two-party sequential dialogues are a special case). Experimental results show that GSN significantly outperforms existing sequence-based models.

TIST Journal 2019 Journal Article

Reconstruction of Hidden Representation for Robust Feature Extraction

  • Zeng Yu
  • Tianrui Li
  • Ning Yu
  • Yi Pan
  • Hongmei Chen
  • Bing Liu

This article aims to develop a new and robust approach to feature representation. Motivated by the success of Auto-Encoders, we first theoretically analyze and summarize the general properties of all algorithms that are based on traditional Auto-Encoders: (1) The reconstruction error of the input cannot be lower than a lower bound, which can be viewed as a guiding principle for reconstructing the input. Additionally, when the input is corrupted with noises, the reconstruction error of the corrupted input also cannot be lower than a lower bound. (2) The reconstruction of a hidden representation achieving its ideal situation is the necessary condition for the reconstruction of the input to reach the ideal state. (3) Minimizing the Frobenius norm of the Jacobian matrix of the hidden representation has a deficiency and may result in a much worse local optimum value. We believe that minimizing the reconstruction error of the hidden representation is more robust than minimizing the Frobenius norm of the Jacobian matrix of the hidden representation. Based on the above analysis, we propose a new model termed Double Denoising Auto-Encoders (DDAEs), which uses corruption and reconstruction on both the input and the hidden representation. We demonstrate that the proposed model is highly flexible and extensible and has a potentially better capability to learn invariant and robust feature representations. We also show that our model is more robust than Denoising Auto-Encoders (DAEs) for dealing with noises or inessential features. Furthermore, we detail how to train DDAEs with two different pretraining methods by optimizing the objective function in a combined and separate manner, respectively. Comparative experiments illustrate that the proposed model is significantly better for representation learning than the state-of-the-art models.

IJCAI Conference 2019 Conference Paper

Spectral Perturbation Meets Incomplete Multi-view Data

  • Hao Wang
  • Linlin Zong
  • Bing Liu
  • Yan Yang
  • Wei Zhou

Beyond existing multi-view clustering, this paper studies a more realistic clustering scenario, referred to as incomplete multi-view clustering, where a number of data instances are missing in certain views. To tackle this problem, we explore spectral perturbation theory. In this work, we show a strong link between perturbation risk bounds and incomplete multi-view clustering. That is, as the similarity matrix fed into spectral clustering is a quantity bounded in magnitude O(1), we transfer the missing problem from data to similarity and tailor a matrix completion method for incomplete similarity matrix. Moreover, we show that the minimization of perturbation risk bounds among different views maximizes the final fusion result across all views. This provides a solid fusion criteria for multi-view data. We motivate and propose a Perturbation-oriented Incomplete multi-view Clustering (PIC) method. Experimental results demonstrate the effectiveness of the proposed method.

AAAI Conference 2018 Conference Paper

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

  • Bing Liu
  • Tong Yu
  • Ian Lane
  • Ole Mengshoel

Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on of- fline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with a nonlinear reward function that uses distributed representation of text for online response selection. A bidirectional LSTM is used to produce the distributed representations of dialog context and responses, which serve as the input to a contextual bandit. In learning the bandit, we propose a customized Thompson sampling method that is applied to a polynomial feature space in approximating the reward. Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples.

AAAI Conference 2018 Conference Paper

Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

  • Hao Zhou
  • Minlie Huang
  • Tianyang Zhang
  • Xiaoyan Zhu
  • Bing Liu

Perception and expression of emotion are key factors to the success of dialogue systems or conversational agents. However, this problem has not been studied in large-scale conversation generation so far. In this paper, we propose Emotional Chatting Machine (ECM) that can generate appropriate responses not only in content (relevant and grammatical) but also in emotion (emotionally consistent). To the best of our knowledge, this is the first work that addresses the emotion factor in large-scale conversation generation. ECM addresses the factor using three new mechanisms that respectively (1) models the high-level abstraction of emotion expressions by embedding emotion categories, (2) captures the change of implicit internal emotion states, and (3) uses explicit emotion expressions with an external emotion vocabulary. Experiments show that the proposed model can generate responses appropriate not only in content but also in emotion.

IJCAI Conference 2018 Conference Paper

Lifelong Domain Word Embedding via Meta-Learning

  • Hu Xu
  • Bing Liu
  • Lei Shu
  • Philip S. Yu

Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.

IJCAI Conference 2017 Conference Paper

Context-aware Path Ranking for Knowledge Base Completion

  • Sahisnu Mazumder
  • Bing Liu

Knowledge base (KB) completion aims to infer missing facts from existing ones in a KB. Among various approaches, path ranking (PR) algorithms have received increasing attention in recent years. PR algorithms enumerate paths between entity-pairs in a KB and use those paths as features to train a model for missing fact prediction. Due to their good performances and high model interpretability, several methods have been proposed. However, most existing methods suffer from scalability (high RAM consumption) and feature explosion (trains on an exponentially large number of features) problems. This paper proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems by introducing a selective path exploration strategy. C-PR learns global semantics of entities in the KB using word embedding and leverages the knowledge of entity semantics to enumerate contextually relevant paths using bidirectional random walk. Experimental results on three large KBs show that the path features (fewer in number) discovered by C-PR not only improve predictive performance but also are more interpretable than existing baselines.

YNICL Journal 2017 Journal Article

Polygenic risk for five psychiatric disorders and cross-disorder and disorder-specific neural connectivity in two independent populations

  • Tianqi Wang
  • Xiaolong Zhang
  • Ang Li
  • Meifang Zhu
  • Shu Liu
  • Wen Qin
  • Jin Li
  • Chunshui Yu

Major psychiatric disorders, including attention deficit hyperactivity disorder (ADHD), autism (AUT), bipolar disorder (BD), major depressive disorder (MDD), and schizophrenia (SZ), are highly heritable and polygenic. Evidence suggests that these five disorders have both shared and distinct genetic risks and neural connectivity abnormalities. To measure aggregate genetic risks, the polygenic risk score (PGRS) was computed. Two independent general populations (N = 360 and N = 323) were separately examined to investigate whether the cross-disorder PGRS and PGRS for a specific disorder were associated with individual variability in functional connectivity. Consistent altered functional connectivity was found with the bilateral insula: for the left supplementary motor area and the left superior temporal gyrus with the cross-disorder PGRS, for the left insula and right middle and superior temporal lobe associated with the PGRS for autism, for the bilateral midbrain, posterior cingulate, cuneus, and precuneus associated with the PGRS for BD, and for the left angular gyrus and the left dorsolateral prefrontal cortex associated with the PGRS for schizophrenia. No significant functional connectivity was found associated with the PGRS for ADHD and MDD. Our findings indicated that genetic effects on the cross-disorder and disorder-specific neural connectivity of common genetic risk loci are detectable in the general population. Our findings also indicated that polygenic risk contributes to the main neurobiological phenotypes of psychiatric disorders and that identifying cross-disorder and specific functional connectivity related to polygenic risks may elucidate the neural pathways for these disorders.

AAAI Conference 2016 Conference Paper

Identifying Search Keywords for Finding Relevant Social Media Posts

  • Shuai Wang
  • Zhiyuan Chen
  • Bing Liu
  • Sherry Emery

In almost any application of social media analysis, the user is interested in studying a particular topic or research question. Collecting posts or messages relevant to the topic from a social media source is a necessary step. Due to the huge size of social media sources (e. g. , Twitter and Facebook), one has to use some topic keywords to search for possibly relevant posts. However, gathering a good set of keywords is a very tedious and time-consuming task. It often involves a lengthy iterative process of searching and manual reading. In this paper, we propose a novel technique to help the user identify topical search keywords. Our experiments are carried out on identifying such keywords for five (5) real-life application topics to be used for searching relevant tweets from the Twitter API. The results show that the proposed method is highly effective.

AAAI Conference 2016 Conference Paper

Improving Opinion Aspect Extraction Using Semantic Similarity and Aspect Associations

  • Qian Liu
  • Bing Liu
  • Yuanlin Zhang
  • Doo Soon Kim
  • Zhiqiang Gao

Aspect extraction is a key task of fine-grained opinion mining. Although it has been studied by many researchers, it remains to be highly challenging. This paper proposes a novel unsupervised approach to make a major improvement. The approach is based on the framework of lifelong learning and is implemented with two forms of recommendations that are based on semantic similarity and aspect associations respectively. Experimental results using eight review datasets show the effectiveness of the proposed approach.

IJCAI Conference 2015 Conference Paper

Automated Rule Selection for Aspect Extraction in Opinion Mining

  • Qian Liu
  • Zhiqiang Gao
  • Bing Liu
  • Yuanlin Zhang

Aspect extraction aims to extract fine-grained opinion targets from opinion texts. Recent work has shown that the syntactical approach, which employs rules about grammar dependency relations between opinion words and aspects, performs quite well. This approach is highly desirable in practice because it is unsupervised and domain independent. However, the rules need to be carefully selected and tuned manually so as not to produce too many errors. Although it is easy to evaluate the accuracy of each rule automatically, it is not easy to select a set of rules that produces the best overall result due to the overlapping coverage of the rules. In this paper, we propose a novel method to select an effective set of rules. To our knowledge, this is the first work that selects rules automatically. Our experiment results show that the proposed method can select a subset of a given rule set to achieve significantly better results than the full rule set and the existing state-of-the-art CRF-based supervised method.

AAAI Conference 2015 Conference Paper

Extracting Verb Expressions Implying Negative Opinions

  • Huayi Li
  • Arjun Mukherjee
  • Jianfeng Si
  • Bing Liu

Identifying aspect-based opinions has been studied extensively in recent years. However, existing work primarily focused on adjective, adverb, and noun expressions. Clearly, verb expressions can imply opinions too. We found that in many domains verb expressions can be even more important to applications because they often describe major issues of products or services. These issues enable brands and businesses to directly improve their products or services. To the best of our knowledge, this problem has not received much attention in the literature. In this paper, we make an attempt to solve this problem. Our proposed method first extracts verb expressions from reviews and then employs Markov Networks to model rich linguistic features and long distance relationships to identify negative issue expressions. Since our training data is obtained from titles of reviews whose labels are automatically inferred from review ratings, our approach is applicable to any domain without manual involvement. Experimental results using real-life review datasets show that our approach outperforms strong baselines.

YNIMG Journal 2015 Journal Article

The cortical surface area of the insula mediates the effect of DBH rs7040170 on novelty seeking

  • Jin Li
  • Yue Cui
  • Karen Wu
  • Bing Liu
  • Yun Zhang
  • Chao Wang
  • Tianzi Jiang

Novelty seeking (NS) is a personality trait important for adaptive functioning, but an excessive level of NS has been linked to psychiatric disorders such as ADHD and substance abuse. Previous research has investigated separately the neural and genetic bases of the NS trait, but results were mixed and neural and genetic bases have yet to be examined within the same study. In this study, we examined the interrelationships among the dopamine beta-hydroxylase (DBH) gene, brain structure, and the NS trait in 359 healthy Han Chinese subjects. We focused on the DBH gene because it encodes a key enzyme for dopamine metabolism, NS is believed to be related to the dopaminergic system and has been reported associated with DBH variation. Results showed a significant positive association between the cortical surface area of the left insula and NS score. Furthermore, the DBH genetic polymorphism at the SNP rs7040170 was strongly associated with both the surface area of the left insula and NS score, with G carriers having a larger left insula surface area and a higher NS score than AA homozygotes. Subsequent path analysis suggested that the insula partially mediated the association between the DBH gene and the NS trait. Our data provided the first evidence for the involvement of the insula in the dopamine–NS relationship. Future studies of molecular mechanisms underlying the NS personality trait and related psychiatric disorders should consider the mediation effect of the neural structure.

YNIMG Journal 2013 Journal Article

KIBRA gene variants are associated with synchronization within the default-mode and executive control networks

  • Dawei Wang
  • Bing Liu
  • Wen Qin
  • Junping Wang
  • Yunting Zhang
  • Tianzi Jiang
  • Chunshui Yu

Genetic variation at the KIBRA rs17070145 polymorphism has been linked to episodic memory, executive function, and Alzheimer's disease (AD), which are related to the structural and functional integrity of the default-mode network (DMN) and executive control network (ECN). We hypothesize that the KIBRA polymorphism could modulate the structure and function of the DMN and ECN in healthy young subjects, which might underlie the association between this gene and cognitive function. To test our hypothesis, we analyzed the resting-state synchronization in the DMN and ECN in 288 young, healthy Chinese Han subjects. We found that carriers of the KIBRA C-allele demonstrated an increased synchronization in the posterior cingulate cortex (PCC) and medial prefrontal cortex (MPFC) of the DMN and in the right anterior insula, bilateral caudate nuclei, and bilateral dorsal anterior cingulate cortices (dACC) of the ECN compared to individuals with a TT genotype. Moreover, KIBRA C-allele carriers also showed a smaller gray matter volume (GMV) in the MPFC and bilateral dACCs than TT individuals. In contrast, there were no significant genotype differences in the synchronization of either the visual network or the sensorimotor network. These findings suggest that the polymorphism in the KIBRA gene affects GMV and the function of the DMN and ECN. This increased synchronization is likely a reflection of compensation for the regional gray matter deficits in these networks in young healthy subjects. The association between KIBRA polymorphisms and the DMN and ECN should be further explored in a healthy older population and in patients with AD.

IS Journal 2013 Journal Article

Knowledge-Based Approaches to Concept-Level Sentiment Analysis

  • Erik Cambria
  • Bjorn Schuller
  • Bing Liu
  • Haixun Wang
  • Catherine Havasi

The guest editors introduce novel approaches to opinion mining and sentiment analysis that go beyond a mere word-level analysis of text and provide concept-level methods. Such approaches allow a more efficient passage from (unstructured) textual information to (structured) machine-processable data, in potentially any domain.

IJCAI Conference 2013 Conference Paper

Leveraging Multi-Domain Prior Knowledge in Topic Models

  • Zhiyuan Chen
  • Arjun Mukherjee
  • Bing Liu
  • Meichun Hsu
  • Malu Castellanos
  • Riddhiman Ghosh

Topic models have been widely used to identify topics in text corpora. It is also known that purely unsupervised models often result in topics that are not comprehensible in applications. In recent years, a number of knowledge-based models have been proposed, which allow the user to input prior knowledge of the domain to produce more coherent and meaningful topics. In this paper, we go one step further to study how the prior knowledge from other domains can be exploited to help topic modeling in the new domain. This problem setting is important from both the application and the learning perspectives because knowledge is inherently accumulative. We human beings gain knowledge gradually and use the old knowledge to help solve new problems. To achieve this objective, existing models have some major difficulties. In this paper, we propose a novel knowledge-based model, called MDK-LDA, which is capable of using prior knowledge from multiple domains. Our evaluation results will demonstrate its effectiveness.

IS Journal 2013 Journal Article

Statistical Approaches to Concept-Level Sentiment Analysis

  • Erik Cambria
  • Bjorn Schuller
  • Bing Liu
  • Haixun Wang
  • Catherine Havasi

The guest editors introduce novel statistical approaches to concept-level sentiment analysis that go beyond a mere syntactic-driven analysis of text and provide semantic-based methods. Such approaches allow a more efficient passage from (unstructured) textual information to (structured) machine-processable data, in potentially any domain.

YNIMG Journal 2013 Journal Article

Variant in OXTR gene and functional connectivity of the hypothalamus in normal subjects

  • Junping Wang
  • Wen Qin
  • Bing Liu
  • Dawei Wang
  • Yunting Zhang
  • Tianzi Jiang
  • Chunshui Yu

The oxytocin receptor gene (OXTR) rs53576A has been associated with autism spectrum disorders (ASDs). A smaller hypothalamic volume has been reported in healthy male A-allele carriers than in male GG homozygotes and in patients with ASDs than in healthy controls. These findings prompt the hypothesis that male AA homozygotes may have weaker hypothalamic functional connectivity when compared to male G-allele carriers. We calculated local functional connectivity density (FCD) using a voxel-wise data-driven approach based on resting-state functional MRI data in 270 young healthy subjects. Both the main effect of genotype and the gender-by-genotype interaction were considered. Of the whole brain, only the local FCD of the hypothalamus exhibited the main effect of genotype. Post-hoc testing revealed significantly lower local FCD in male AA homozygotes compared to male G-allele carriers although there was only a trend of significance in the gender-by-genotype interaction. We further analyzed the resting-state functional connectivity (rsFC) of the hypothalamic region that demonstrating significant genotype differences in local FCD. We found a significant gender-by-genotype interaction in rsFC between the hypothalamic region and the left dorsolateral prefrontal cortex, but no significant main effect of genotype was found. Post-hoc testing revealed that this rsFC was significantly weaker in male AA homozygotes compared to male G-allele carriers. Our findings identify gender-dependent mechanisms of OXTR rs53576 gene variation impacting the functional connectivity of the hypothalamus in healthy individuals and suggest that these mechanisms are important for understanding ASDs.

TIST Journal 2012 Journal Article

Identify Online Store Review Spammers via Social Review Graph

  • Guan Wang
  • Sihong Xie
  • Bing Liu
  • Philip S. Yu

Online shopping reviews provide valuable information for customers to compare the quality of products, store services, and many other aspects of future purchases. However, spammers are joining this community trying to mislead consumers by writing fake or unfair reviews to confuse the consumers. Previous attempts have used reviewers’ behaviors such as text similarity and rating patterns, to detect spammers. These studies are able to identify certain types of spammers, for instance, those who post many similar reviews about one target. However, in reality, there are other kinds of spammers who can manipulate their behaviors to act just like normal reviewers, and thus cannot be detected by the available techniques. In this article, we propose a novel concept of review graph to capture the relationships among all reviewers, reviews and stores that the reviewers have reviewed as a heterogeneous graph. We explore how interactions between nodes in this graph could reveal the cause of spam and propose an iterative computation model to identify suspicious reviewers. In the review graph, we have three kinds of nodes, namely, reviewer, review, and store. We capture their relationships by introducing three fundamental concepts, the trustiness of reviewers, the honesty of reviews, and the reliability of stores, and identifying their interrelationships: a reviewer is more trustworthy if the person has written more honesty reviews; a store is more reliable if it has more positive reviews from trustworthy reviewers; and a review is more honest if many other honest reviews support it. This is the first time such intricate relationships have been identified for spam detection and captured in a graph model. We further develop an effective computation method based on the proposed graph model. Different from any existing approaches, we do not use an review text information. Our model is thus complementary to existing approaches and able to find more difficult and subtle spamming activities, which are agreed upon by human judges after they evaluate our results.

IS Journal 2012 Journal Article

Product Feature Grouping for Opinion Mining

  • Zhongwu Zhai
  • Bing Liu
  • Jingyuan Wang
  • Hua Xu
  • Peifa Jia

A constrained semisupervised learning method classifies words and phrases into feature groups, making it easier to produce an opinion summary of various product reviews.

AAAI Conference 2011 Conference Paper

Identifying Evaluative Sentences in Online Discussions

  • Zhongwu Zhai
  • Bing Liu
  • Lei Zhang
  • Hua Xu
  • Peifa Jia

Much of opinion mining research focuses on product reviews because reviews are opinion-rich and contain little irrelevant information. However, this cannot be said about online discussions and comments. In such postings, the discussions can get highly emotional and heated with many emotional statements, and even personal attacks. As a result, many of the postings and sentences do not express positive or negative opinions about the topic being discussed. To find people’s opinions on a topic and its different aspects, which we call evaluative opinions, those irrelevant sentences should be removed. The goal of this research is to identify evaluative opinion sentences. A novel unsupervised approach is proposed to solve the problem, and our experimental results show that it performs well.

TCS Journal 2011 Journal Article

Probabilistic approximations of ODEs based bio-pathway dynamics

  • Bing Liu
  • David Hsu
  • P.S. Thiagarajan

Bio-chemical networks are often modeled as systems of ordinary differential equations (ODEs). Such systems will not admit closed form solutions and hence numerical simulations will have to be used to perform analyses. However, the number of simulations required to carry out tasks such as parameter estimation can become very large. To get around this, we propose a discrete probabilistic approximation of the ODEs dynamics. We do so by discretizing the value and the time domain and assuming a distribution of initial states w. r. t. the discretization. Then we sample a representative set of initial states according to the assumed initial distribution and generate a corresponding set of trajectories through numerical simulations. Finally, using the structure of the signaling pathway we encode these trajectories compactly as a dynamic Bayesian network. This approximation of the signaling pathway dynamics has several advantages. First, the discretized nature of the approximation helps to bridge the gap between the accuracy of the results obtained by ODE simulation and the limited precision of experimental data used for model construction and verification. Second and more importantly, many interesting pathway properties can be analyzed efficiently through standard Bayesian inference techniques instead of resorting to a large number of ODE simulations. We have tested our method on ODE models of the EGF-NGF signaling pathway [1] and the segmentation clock pathway [2]. The results are very promising in terms of accuracy and efficiency.

YNIMG Journal 2010 Journal Article

Haplotypes of catechol-O-methyltransferase modulate intelligence-related brain white matter integrity

  • Bing Liu
  • Jun Li
  • Chunshui Yu
  • Yonghui Li
  • Yong Liu
  • Ming Song
  • Ming Fan
  • Kuncheng Li

Twin studies have indicated a common genetic origin for intelligence and for variations in brain morphology. Our previous diffusion tensor imaging studies found an association between intelligence and white matter integrity of specific brain regions or tracts. However, specific genetic determinants of the white matter integrity of these brain regions and tracts are still unclear. In this study, we assess whether and how catechol-O-methyltransferase (COMT) gene polymorphisms affect brain white matter integrity. We genotyped twelve single nucleotide polymorphisms (SNPs) within the COMT gene and performed haplotype analyses on data from 79 healthy subjects. Our subjects had the same three major COMT haplotypes (termed the HPS, APS and LPS haplotypes) as previous studies have reported as regulating significantly different levels of enzymatic activity and dopamine. We used the mean fractional anisotropy (FA) values from four regions and five tracts of interest to assess the effect of COMT polymorphisms, including the well-studied val158met SNP and the three main haplotypes that we had identified, on intelligence-related white matter integrity. We identified an association between the mean FA values of two regions in the bilateral prefrontal lobes and the COMT haplotypes, rather than between them and val158met. The haplotype-FA value associations modulated nonlinearly and fit an inverted U-model. Our findings suggest that COMT haplotypes can nonlinearly modulate the intelligence-related white matter integrity of the prefrontal lobes by more significantly influencing prefrontal dopamine variations than does val158met.

IJCAI Conference 2009 Conference Paper

  • Guang Qiu
  • Bing Liu
  • Jiajun Bu
  • Chun Chen

In most sentiment analysis applications, the sentiment lexicon plays a key role. However, it is hard, if not impossible, to collect and maintain a universal sentiment lexicon for all application domains because different words may be used in different domains. The main existing technique extracts such sentiment words from a large domain corpus based on different conjunctions and the idea of sentiment coherency in a sentence. In this paper, we propose a novel propagation approach that exploits the relations between sentiment words and topics or product features that the sentiment words modify, and also sentiment words and product features themselves to extract new sentiment words. As the method propagates information through both sentiment words and features, we call it double propagation. The extraction rules are designed based on relations described in dependency trees. A new method is also proposed to assign polarities to newly discovered sentiment words in a domain. Experimental results show that our approach is able to extract a large number of new sentiment words. The polarity assignment method is also effective.

IJCAI Conference 2007 Conference Paper

  • Xiao-li Li
  • Bing Liu
  • See-Kiong Ng

Traditional classification involves building a clas-sifier using labeled training examples from a set of predefined classes and then applying the classifier to classify test instances into the same set of classes. In practice, this paradigm can be problematic be-cause the test data may contain instances that do not belong to any of the previously defined classes. Detecting such unexpected instances in the test set is an important issue in practice. The problem can be formulated as learning from positive and unla-beled examples (PU learning). However, current PU learning algorithms require a large proportion of negative instances in the unlabeled set to be effec-tive. This paper proposes a novel technique to solve this problem in the text classification domain. The technique first generates a single artificial negative document AN. The sets P and {AN} are then used to build a na&iuml; ve Bayesian classifier. Our experiment results show that this method is significantly better than existing techniques.

AAAI Conference 2004 Conference Paper

Mining Opinion Features in Customer Reviews

  • Minqing Hu
  • Bing Liu

It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds. This makes it difficult for a potential customer to read them in order to make a decision on whether to buy the product. In this project, we aim to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we are only interested in the specific features of the product that customers have opinions on and also whether the opinions are positive or negative. We do not summarize the reviews by selecting or rewriting a subset of the original sentences from the reviews to capture their main points as in the classic text summarization. In this paper, we only focus on mining opinion/product features that the reviewers have commented on. A number of techniques are presented to mine such features. Our experimental results show that these techniques are highly effective.

IS Journal 2004 Journal Article

Mining Web Pages for Data Records

  • Bing Liu
  • R. Grossman
  • Yanhong Zhai

Data mining to extract information from Web pages can help provide value-added services. The MDR (mining data records) system exploits Web page structure and uses a string-matching algorithm to mine contiguous and noncontiguous data records.

AAAI Conference 2004 Conference Paper

Text Classification by Labeling Words

  • Bing Liu
  • Wee Sun Lee

Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate learning). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labeling of documents. Our results show that the new method is highly effective and promising.

IJCAI Conference 2003 Conference Paper

Learning to Classify Texts Using Positive and Unlabeled Data

  • Xiaoli Li
  • Bing Liu

In traditional text classification, a classifier is built using labeled training documents of every class. This paper studies a different problem. Given a set P of documents of a particular class (called positive class) and a set U of unlabeled documents that contains documents from class P and also other types of documents (called negative class documents), we want to build a classifier to classify the documents in U into documents from P and documents not from P. The key feature of this problem is that there is no labeled negative document, which makes traditional text classification techniques inapplicable. In this paper, we propose an effective technique to solve the problem. It combines the Rocchio method and the SVM technique for classifier building. Experimental results show that the new method outperforms existing methods significantly.

IJCAI Conference 2003 Conference Paper

Web Page Cleaning for Web Mining through Feature Weighting

  • Lan Yi
  • Bing Liu

Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e. g. , banner ads, navigation bars, and copyright notices. Such irrelevant information (which we call Web page noise) in Web pages can seriously harm Web mining, e. g. , clustering and classification. In this paper, we propose a novel feature weighting technique to deal with Web page noise to enhance Web mining. This method first builds a compressed structure tree to capture the common structure and comparable blocks in a set of Web pages. It then uses an information based measure to evaluate the importance of each node in the compressed structure tree. Based on the tree and its node importance values, our method assigns a weight to each word feature in its content block. The resulting weights are used in Web mining. We evaluated the proposed technique with two Web mining tasks, Web page clustering and Web page classification. Experimental results show that our weighting method is able to dramatically improve the mining results.

AAAI Conference 2000 Conference Paper

Intuitive Representation of Decision Trees Using General Rules and Exceptions

  • Bing Liu
  • and Wynne Hsu

Producing too many rules is a major problem with many data mining techniques. This paper argues that one of the key reasons for the large number of rules is that an inefficient knowledge representation scheme has been used. The current predominant representation of the discovered knowledge is the if-then rules. This representation often severely fragments the knowledge that exists in the data, thereby resulting in a large number of rules. The fragmentation also makes the discovered rules hard to understand and to use. In this paper, we propose a more efficient representation scheme, called general rules & exceptions. In this representation, a unit of knowledge consists of a single general rule and a set of exceptions. This scheme reduces the complexity of the discovered knowledge substantially. It is also intuitive and easy to understand. This paper focuses on using the representation to express the knowledge embedded in a decision tree. An algorithm that converts a decision tree to the new representation is presented. Experiment results show that the new representation dramatically simplifies the decision tree. Real-life applications also confirm that this representation is more intuitive to human users.

IJCAI Conference 1997 Conference Paper

Discovering Interesting Holes in Data

  • Bing Liu
  • Liang-Ping Ku
  • Wynne Hsu

Current machine learning and discovery techniques focus on discovering rules or regularities that exist in data. An important aspect of the research that has been ignored in the past is the learning or discovering of interesting holes in the database. If we view each case in the database as a point in a it-dimensional space, then a hole is simply a region in the space that contains no data point. Clearly, not every hole is interesting. Some holes are obvious because it is known that certain value combinations are not possible. Some holes exist because there are insufficient cases in the database. However, in some situations, empty regions do carry important information. For instance, they could warn us about some missing value combinations that are either not known before or are unexpected. Knowing these missing value combinations may lead to significant discoveries. In this paper, we propose an algorithm to discover holes in databases.

AAAI Conference 1996 Conference Paper

Post-Analysis of Learned Rules

  • Bing Liu

Rule induction research implicitly assumes that after producing the rules from a dataset, these rules will be used directly by an expert system or a human user. In real-life applications, the situation may not be as simple as that, particularly, when the user of the rules is a human being. The human user almost always has some previous concepts or knowledge about the domain represented by the dataset. Naturally, he/she wishes to know how the new rules compare with his/her existing knowledge. In dynamic domains where the rules may change over time, it is important to know what the changes are. These aspects of research have largely been ignored in the past. With the increasing use of machine learning techniques in practica1 applications such as data mining, this issue of post analysis of rules warrants greater emphasis and attention. In this paper, we propose a technique to deal with this problem. A system has been implemented to perform the post analysis of classification rules generated by systems such as C4.5. The proposed technique is general and highly interactive. It will be particularly useful in data mining and data analysis.

AAAI Conference 1996 Conference Paper

Using Constraints to Model Disjunctions in Rule-Based Reasoning

  • Bing Liu

Rule-based systems have long been widely used for building expert systems to perform practical knowledge intensive tasks. One important issue that has not been addressed satisfactorily is the disjunction, and this significantly limits their problem solving power. In this paper, we show that some important types of disjunction can be modeled with Constraint Satisfaction Problem (CSP) techniques, employing their simple representation schemes and efticient algorithms. A key idea is that disjunctions are represented as constraint variables, relations among disjunctions are represented as constraints, and rule chaining is integrated with constraint solving. In this integration, a constraint variable or a constraint is regarded as a special fact, and rules can be written with constraints and information about constraints. Chaining of rules may trigger constraint propagation, and constraint propagation may cause firing of rules. A prototype system (called CFR) based on this idea has been implemented.