Arrow Research search

Author name cluster

Lu Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

AAAI Conference 2026 Conference Paper

Conditional Information Bottleneck for Multimodal Fusion: Overcoming Shortcut Learning in Sarcasm Detection

  • Yihua Wang
  • Qi Jia
  • Cong Xu
  • Feiyu Chen
  • Yuhan Liu
  • Haotian Zhang
  • Liang Jin
  • Lu Liu

Multimodal sarcasm detection is a complex task that requires distinguishing subtle complementary signals across modalities while filtering out irrelevant information. Many advanced methods rely on learning shortcuts from datasets rather than extracting intended sarcasm-related features. However, our experiments show that shortcut learning impairs the model's generalization in real-world scenarios. Furthermore, we reveal the weaknesses of current modality fusion strategies for multimodal sarcasm detection through systematic experiments, highlighting the necessity of focusing on effective modality fusion for complex emotion recognition. To address these challenges, we construct MUStARD++R by removing shortcut signals from MUStARD++. Then, a Multimodal Conditional Information Bottleneck (MCIB) model is introduced to enable efficient multimodal fusion for sarcasm detection. Experimental results show that the MCIB achieves the best performance without relying on shortcut learning.

AAAI Conference 2026 Conference Paper

Explainable Depression Assessment from Face Videos by Weakly Supervised Learning

  • Rongfan Liao
  • Xiangyu Kong
  • Shiqing Tang
  • Lang He
  • Changzeng Fu
  • Weicheng Xie
  • Xiaofeng Liu
  • Lu Liu

Existing video-based automatic depression assessment (ADA) approaches frequently achieve video-level depression assessment by aggregating features or predictions of individual frames or equal-length segments within the given video. While their performances have been largely enhanced by recent advanced deep learning models, they typically fail to explicitly consider the varied importance of depression-related behavioural cues across different video segments, i.e., segments within one video may contain behaviours reflecting varying levels of depression. Underestimating segment-level variations can obscure the detection of facial behaviour cues associated with depression, thereby undermining the accuracy and interpretability of video-based depression detection systems. In this paper, we propose a novel video-based ADA approach that specifically identifies and differentiates video segments that exhibit depression-related facial behaviours across varying temporal durations, providing clear insights into how each segment contributes to the video-level depression prediction. To achieve this, a novel weakly supervised strategy is proposed to compare segment-level behaviours with video-level depression label, enabling the model to assign depression-relevant scores to multiple temporal scale video segments and attend selectively to those most indicative of depressive states. Extensive experiments on the AVEC 2013 and AVEC 2014 face video depression datasets demonstrate the effectiveness of our approach.

AAAI Conference 2026 Conference Paper

TimeCAP: A Channel-Aware Pre-Training Framework for Multivariate Time Series Forecasting

  • Chuanru Ren
  • Yao Lu
  • Tianjin Huang
  • Haowen Zheng
  • Hengde Zhu
  • Yunyin Li
  • Hengxiao Li
  • Lu Liu

Amid recent advances for multivariate time series forecasting, self-supervised learning has emerged as a promising paradigm for deriving transferable knowledge from multi-domain data. Despite its effectiveness, existing approaches exhibit two critical limitations: (1) Underestimating the significance of multivariate dependencies in learning generalizable representations and (2) Failing to reconcile the complementary strengths of autoregressive and one-shot generative paradigms. In this work, we propose TimeCAP, a novel channel-aware pre-training framework that internalizes latent causal relationships among variables inherent in multi-domain data, and effectively transfers the acquired knowledge to downstream applications. Technically, we present a flexible channel-grouping learning approach, complemented by an adaptive meta-routing mechanism, enabling TimeCAP to parallel recognize intra-group local patterns while maintaining global coherence. Intra- and inter-group multivariate dependencies are captured through the self- and cross-attention with channel-aware mask, which strictly confine interactions among time-aligned, fine-grained multivariate tokens. To seamlessly unify two advanced generative paradigms, we propose a novel dynamic dual-head decoding and optimization strategy, empowering TimeCAP to leverage critical dependencies in the output series while avoiding cumulative errors over time. In the few-shot evaluation, TimeCAP achieves average MSE and MAE reductions of 11.8% and 6% over leading baselines, while also outperforming state-of-the-art models in full-shot and zero-shot settings by large margins.

JBHI Journal 2025 Journal Article

A Clinical Data Based Framework for Outcome Forecasting in Patients With Pneumonia

  • Rui Gao
  • Robert C. Free
  • Ashiq Anjum
  • Xiang Sun
  • Gerrit Woltmann
  • Lu Liu

Respiratory diseases are a major cause of death globally, placing a significant burden on healthcare services. Early-stage clinical decision-making is crucial for enabling personalized, prioritized treatment and more efficient allocation of healthcare resources. Clinicians can intervene proactively and develop appropriate treatment plans for patients when provided with vital information such as mortality prediction, deterioration detection, and length-of-stay prediction. To precisely predict such vitals, it is essential to leverage sequential information that is inherent in clinical variables. In this paper, we employ a unified framework for patient outcome forecasting in pneumonia patients. The proposed model utilizes clinical time-series data of varying lengths, along with static admission information, to effectively capture the sequential information of clinical variables. Additionally, we model the imbalanced distribution of mortality prediction and deterioration detection through weight constraints, and we account for the right-skewed distribution of length-of-stay data to enhance the robustness of the model. Furthermore, we develop a data splitting strategy to track dynamic changes in model performance at different timestamps, helping to bridge the gap between testing conditions and real-world scenarios. We conduct experiments on CAP-AI dataset that was obtained and collected from the University Hospitals of Leicester with the involvement of clinicians. It is based on real-world clinical data from patients admitted with pneumonia-related diagnoses. Extensive experimental results demonstrate the effectiveness and robustness of our approach whilst predicting patient outcomes in a clinical setting.

JBHI Journal 2025 Journal Article

Bidirectional Prototype-Guided Consistency Constraint for Semi-Supervised Fetal Ultrasound Image Segmentation

  • Chongwen Lyu
  • Kai Han
  • Lu Liu
  • Jun Chen
  • Lele Ma
  • Zheng Pang
  • Zhe Liu

Fetal ultrasound (US) image segmentation plays an important role in fetal development assessment, maternal pregnancy management, and intrauterine surgery planning. However, obtaining large-scale, accurately annotated fetal US imaging data is time-consuming and labor-intensive, posing challenges to the application of deep learning in this field. To address this challenge, we propose a semi-supervised fetal US image segmentation method based on bidirectional prototype-guided consistency constraint (BiPCC). BiPCC utilizes the prototype to bridge labeled and unlabeled data and establishes interaction between them. Specifically, the model generates pseudo-labels using prototypes from labeled data and then utilizes these pseudo-labels to generate pseudo-prototypes for segmenting the labeled data inversely, thereby achieving bidirectional consistency. Additionally, uncertainty-based cross-supervision is incorporated to provide additional supervision signals, thereby enhancing the quality of pseudo-labels. Extensive experiments on two fetal US datasets demonstrate that BiPCC outperforms state-of-the-art methods for semi-supervised fetal US segmentation. Furthermore, experimental results on two additional medical segmentation datasets exhibit BiPCC's outstanding generalization capability for diverse medical image segmentation tasks. Our proposed method offers a novel insight for semi-supervised fetal US image segmentation and holds promise for further advancing the development of intelligent healthcare.

AAAI Conference 2025 Conference Paper

PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation

  • Hengde Zhu
  • Xiangyu Kong
  • Weicheng Xie
  • Xin Huang
  • Xilin He
  • Lu Liu
  • Linlin Shen
  • Wei Zhang

In dyadic human-human interactions, individuals may express multiple different facial reactions in response to the same/similar behaviours expressed by their conversational partners depending on their personalised behaviour patterns. As a result, frequently-employed reconstruction loss-based strategies lead the training of previous automatic facial reaction generation (FRG) models to not only suffer from the 'one-to-many mapping' problem, but also fail to comprehensively consider the quality of the generated facial reactions. Besides, none of them considered such personalised behaviour patterns in generating facial reactions. In this paper, we propose the first adversarial FRG model training strategy which jointly learns appropriateness and realism discriminators to provide comprehensive task-specific supervision for training the target facial reaction generators, and reformulates the 'one-to-many (facial reactions) mapping' training problem as a 'one-to-one (distribution) mapping' training task, i.e., the FRG model is trained to output a distribution representing multiple appropriate/plausible facial reaction from each input human behaviour. In addition, our approach also serves as the first offline FRG approach that considers personalised behaviour patterns in generating of target individuals' facial reactions. Experiments show that our PerReactor not only largely outperformed all existing offline solutions for generating more appropriate, diverse and realistic facial reactions, but also is the first approach that can effectively generate personalised appropriate facial reactions.

NeurIPS Conference 2025 Conference Paper

REOBench: Benchmarking Robustness of Earth Observation Foundation Models

  • Xiang Li
  • Yong Tao
  • Siyuan Zhang
  • Siwei Liu
  • Zhitong Xiong
  • Chunbo Luo
  • Lu Liu
  • Mykola Pechenizkiy

Earth observation foundation models have shown strong generalization across multiple Earth observation tasks, but their robustness under real-world perturbations remains underexplored. To bridge this gap, we introduce REOBench, the first comprehensive benchmark for evaluating the robustness of Earth observation foundation models across six tasks and twelve types of image corruptions, including both appearance-based and geometric perturbations. To ensure realistic and fine-grained evaluation, our benchmark focuses on high-resolution optical remote sensing images, which are widely used in critical applications such as urban planning and disaster response. We conduct a systematic evaluation of a broad range of models trained using masked image modeling, contrastive learning, and vision-language pre-training paradigms. Our results reveal that (1) existing Earth observation foundation models experience significant performance degradation when exposed to input corruptions. (2) The severity of degradation varies across tasks, model architectures, backbone sizes, and types of corruption, with performance drop varying from less than 1% to over 25%. (3) Vision-language models show enhanced robustness, particularly in multimodal tasks. REOBench underscores the vulnerability of current Earth observation foundation models to real-world corruptions and provides actionable insights for developing more robust and reliable models.

AAAI Conference 2024 Conference Paper

A Fast Exact Solver with Theoretical Analysis for the Maximum Edge-Weighted Clique Problem

  • Lu Liu
  • Mingyu Xiao
  • Yi Zhou

The maximum vertex-weighted clique problem (MVWCP) and the maximum edge-weighted clique problem (MEWCP) are two natural extensions of the fundamental maximum clique problem. In this paper, we systematically study MEWCP and make the following major contributions: (1) We show that MEWCP is NP-hard even when the minimum degree of the graph is n-2, in contrast to MVWCP which is polynomial-time solvable when the minimum degree of the graph is at least n-3. This result distinguishes the complexity of the two problems for the first time. (2) To address MEWCP, we develop an efficient branch-and-bound algorithm called MEWCat with both practical and theoretical performance guarantees. In practice, MEWCat utilizes a new upper bound tighter than existing ones, which allows for more efficient pruning of branches. In theory, we prove a running-time bound of O*(1.4423^n) for MEWCat, which breaks the trivial bound of O*(2^n) in the research line of practical exact MEWCP solvers for the first time. (3) Empirically, we evaluate the performance of MEWCat on various benchmark instances. The experiments demonstrate that MEWCat outperforms state-of-the-art exact solvers significantly. For instance, on 16 DIMACS graphs that the state-of-the-art solver BBEWC fails to solve within 7200 seconds, MEWCat solves all of them with an average time of less than 1000 seconds. On real-world graphs, MEWCat achieves an average speedup of over 36x.

NeurIPS Conference 2024 Conference Paper

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

  • Qi Jia
  • Baoyu Fan
  • Cong Xu
  • Lu Liu
  • Liang Jin
  • Guoguang Du
  • Zhenhua Guo
  • Yaqian Zhao

Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos and has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro videos and the related comments provide a rich application scenario for viewers’ induced sentiment analysis. In light of this, we introduces a novel research task, Multimodal Sentiment Analysis for Comment Response of Video Induced(MSA-CRVI), aims to infer opinions and emotions according to comments response to micro video. Meanwhile, we manually annotate a dataset named Comment Sentiment toward to Micro Video (CSMV) to support this research. It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107, 267 comments and 8, 210 micro videos with a video duration of 68. 83 hours. To infer the induced sentiment of comment should leverage the video content, we propose the Video Content-aware Comment Sentiment Analysis (VC-CSA) method as a baseline to address the challenges inherent in this new task. Extensive experiments demonstrate that our method is showing significant improvements over other established baselines. We make the dataset and source code publicly available at https: //github. com/IEIT-AGI/MSA-CRVI.

JBHI Journal 2022 Journal Article

A Fully Deep Learning Paradigm for Pneumoconiosis Staging on Chest Radiographs

  • Wenjian Sun
  • Dongsheng Wu
  • Yang Luo
  • Lu Liu
  • Hongjing Zhang
  • Shuang Wu
  • Yan Zhang
  • Chenglong Wang

Pneumoconiosis staging has been a very challenging task, both for certified radiologists and computer-aided detection algorithms. Although deep learning has shown proven advantages in the detection of pneumoconiosis, it remains challenging in pneumoconiosis staging due to the stage ambiguity of pneumoconiosis and noisy samples caused by misdiagnosis when they are used in training deep learning models. In this article, we propose a fully deep learning pneumoconiosis staging paradigm that comprises a segmentation procedure and a staging procedure. The segmentation procedure extracts lung fields in chest radiographs through an Asymmetric Encoder-Decoder Network (AED-Net) that can mitigate the domain shift between multiple datasets. The staging procedure classifies the lung fields into four stages through our proposed deep log-normal label distribution learning and focal staging loss. The two cascaded procedures can effectively solve the problem of model overfitting caused by stage ambiguity and noisy labels of pneumoconiosis. Besides, we collect a clinical chest radiograph dataset of pneumoconiosis from the certified radiologist's diagnostic reports. The experimental results on this novel pneumoconiosis dataset confirm that the proposed deep pneumoconiosis staging paradigm achieves an Accuracy of 90. 4%, a Precision of 84. 8%, a Sensitivity of 78. 4%, a Specificity of 95. 6%, an F1-score of 80. 9% and an Area Under the Curve (AUC) of 96%. In particular, we achieve 68. 4% Precision, 76. 5% Sensitivity, 95% Specificity, 72. 2% F1-score and 89% AUC on the early pneumoconiosis ‘stage-1’.

JBHI Journal 2022 Journal Article

An Effective Semi-Supervised Approach for Liver CT Image Segmentation

  • Kai Han
  • Lu Liu
  • Yuqing Song
  • Yi Liu
  • Chengjian Qiu
  • Yangyang Tang
  • Qiaoying Teng
  • Zhe Liu

Despite the substantial progress made by deep networks in the field of medical image segmentation, they generally require sufficient pixel-level annotated data for training. The scale of training data remains to be the main bottleneck to obtain a better deep segmentation model. Semi-supervised learning is an effective approach that alleviates the dependence on labeled data. However, most existing semi-supervised image segmentation methods usually do not generate high-quality pseudo labels to expand training dataset. In this paper, we propose a deep semi-supervised approach for liver CT image segmentation by expanding pseudo-labeling algorithm under the very low annotated-data paradigm. Specifically, the output features of labeled images from the pretrained network combine with corresponding pixel-level annotations to produce class representations according to the mean operation. Then pseudo labels of unlabeled images are generated by calculating the distances between unlabeled feature vectors and each class representation. To further improve the quality of pseudo labels, we adopt a series of operations to optimize pseudo labels. A more accurate segmentation network is obtained by expanding the training dataset and adjusting the contributions between supervised and unsupervised loss. Besides, the novel random patch based on prior locations is introduced for unlabeled images in the training procedure. Extensive experiments show our method has achieved more competitive results compared with other semi-supervised methods when fewer labeled slices of LiTS dataset are available.

NeurIPS Conference 2022 Conference Paper

Federated Learning from Pre-Trained Models: A Contrastive Learning Approach

  • Yue Tan
  • Guodong Long
  • Jie Ma
  • Lu Liu
  • Tianyi Zhou
  • Jing Jiang

Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. Here, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.

AAAI Conference 2022 Conference Paper

FedProto: Federated Prototype Learning across Heterogeneous Clients

  • Yue Tan
  • Guodong Long
  • Lu Liu
  • Tianyi Zhou
  • Qinghua Lu
  • Jing Jiang
  • Chengqi Zhang

Heterogeneity across clients in federated learning (FL) usually hinders the optimization convergence and generalization performance when the aggregation of clients’ knowledge occurs in the gradient space. For example, clients may differ in terms of data distribution, network latency, input/output space, and/or model architecture, which can easily lead to the misalignment of their local gradients. To improve the tolerance to heterogeneity, we propose a novel federated prototype learning (FedProto) framework in which the clients and server communicate the abstract class prototypes instead of the gradients. FedProto aggregates the local prototypes collected from different clients, and then sends the global prototypes back to all clients to regularize the training of local models. The training on each client aims to minimize the classification error on the local data while keeping the resulting local prototypes sufficiently close to the corresponding global ones. Moreover, we provide a theoretical analysis to the convergence rate of FedProto under non-convex objectives. In experiments, we propose a benchmark setting tailored for heterogeneous FL, with FedProto outperforming several recent FL approaches on multiple datasets.

ICLR Conference 2021 Conference Paper

Free Lunch for Few-shot Learning: Distribution Calibration

  • Shuo Yang 0006
  • Lu Liu
  • Min Xu 0001

Learning from a limited number of samples is challenging since the learned model can easily become overfitted based on the biased distribution formed by only a few training examples. In this paper, we calibrate the distribution of these few-sample classes by transferring statistics from the classes with sufficient examples. Then an adequate number of examples can be sampled from the calibrated distribution to expand the inputs to the classifier. We assume every dimension in the feature representation follows a Gaussian distribution so that the mean and the variance of the distribution can borrow from that of similar classes whose statistics are better estimated with an adequate number of samples. Our method can be built on top of off-the-shelf pretrained feature extractors and classification models without extra parameters. We show that a simple logistic regression classifier trained using the features sampled from our calibrated distribution can outperform the state-of-the-art accuracy on three datasets (~5% improvement on miniImageNet compared to the next best). The visualization of these generated features demonstrates that our calibrated distribution is an accurate estimation.

NeurIPS Conference 2021 Conference Paper

Recognizing Vector Graphics without Rasterization

  • Xinyang Jiang
  • Lu Liu
  • Caihua Shan
  • Yifei Shen
  • Xuanyi Dong
  • Dongsheng Li

In this paper, we consider a different data format for images: vector graphics. In contrast to raster graphics which are widely used in image recognition, vector graphics can be scaled up or down into any resolution without aliasing or information loss, due to the analytic representation of the primitives in the document. Furthermore, vector graphics are able to give extra structural information on how low-level elements group together to form high level shapes or structures. These merits of graphic vectors have not been fully leveraged in existing methods. To explore this data format, we target on the fundamental recognition tasks: object localization and classification. We propose an efficient CNN-free pipeline that does not render the graphic into pixels (i. e. rasterization), and takes textual document of the vector graphics as input, called YOLaT (You Only Look at Text). YOLaT builds multi-graphs to model the structural and spatial information in vector graphics, and a dual-stream graph neural network is proposed to detect objects from the graph. Our experiments show that by directly operating on vector graphics, YOLaT outperforms raster-graphic based object detection baselines in terms of both average precision and efficiency. Code is available at https: //github. com/microsoft/YOLaT-VectorGraphicsRecognition.

AAAI Conference 2020 Conference Paper

Attribute Propagation Network for Graph Zero-Shot Learning

  • Lu Liu
  • Tianyi Zhou
  • Guodong Long
  • Jing Jiang
  • Chengqi Zhang

The goal of zero-shot learning (ZSL) is to train a model to classify samples of classes that were not seen during training. To address this challenging task, most ZSL methods relate unseen test classes to seen(training) classes via a predefined set of attributes that can describe all classes in the same semantic space, so the knowledge learned on the training classes can be adapted to unseen classes. In this paper, we aim to optimize the attribute space for ZSL by training a propagation mechanism to refine the semantic attributes of each class based on its neighbors and related classes on a graph of classes. We show that the propagated attributes can produce classifiers for zero-shot classes with significantly improved performance in different ZSL settings. The graph of classes is usually free or very cheap to acquire such as WordNet or ImageNet classes. When the graph is not provided, given predefined semantic embeddings of the classes, we can learn a mechanism to generate the graph in an end-to-end manner along with the propagation mechanism. However, this graphaided technique has not been well-explored in the literature. In this paper, we introduce the “attribute propagation network (APNet)”, which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier categorizing an image to the class with the nearest attribute vector to the image’s embedding. For better generalization over unseen classes, different from previous methods, we adopt a metalearning strategy to train the propagation mechanism and the similarity metric for the NN classifier on multiple sub-graphs, each associated with a classification task over a subset of training classes. In experiments with two zero-shot learning settings and five benchmark datasets, APNet achieves either compelling performance or new state-of-the-art results.

TIST Journal 2019 Journal Article

Energy-efficient Static Task Scheduling on VFI-based NoC-HMPSoCs for Intelligent Edge Devices in Cyber-physical Systems

  • Umair Ullah Tariq
  • Haider Ali
  • Lu Liu
  • John Panneerselvam
  • Xiaojun Zhai

The interlinked processing units in modern Cyber-Physical Systems (CPS) creates a large network of connected computing embedded systems. Network-on-Chip (NoC)-based Multiprocessor System-on-Chip (MPSoC) architecture is becoming a de facto computing platform for real-time applications due to its higher performance and Quality-of-Service (QoS). The number of processors has increased significantly on the multiprocessor systems in CPS; therefore, Voltage Frequency Island (VFI) has been recently adopted for effective energy management mechanism in the large-scale multiprocessor chip designs. In this article, we investigated energy-efficient and contention-aware static scheduling for tasks with precedence and deadline constraints on intelligent edge devices deploying heterogeneous VFI-based NoC-MPSoCs (VFI-NoC-HMPSoC) with DVFS-enabled processors. Unlike the existing population-based optimization algorithms, we proposed a novel population-based algorithm called ARSH-FATI that can dynamically switch between explorative and exploitative search modes at run-time. Our static scheduler ARHS-FATI collectively performs task mapping, scheduling, and voltage scaling. Consequently, its performance is superior to the existing state-of-the-art approach proposed for homogeneous VFI-based NoC-MPSoCs. We also developed a communication contention-aware Earliest Edge Consistent Deadline First (EECDF) scheduling algorithm and gradient descent--inspired voltage scaling algorithm called Energy Gradient Decent (EGD). We introduced a notion of Energy Gradient (EG) that guides EGD in its search for island voltage settings and minimize the total energy consumption. We conducted the experiments on eight real benchmarks adopted from Embedded Systems Synthesis Benchmarks (E3S). Our static scheduling approach ARSH-FATI outperformed state-of-the-art technique and achieved an average energy-efficiency of ∼24% and ∼30% over CA-TMES-Search and CA-TMES-Quick, respectively.

NeurIPS Conference 2019 Conference Paper

Learning to Propagate for Graph Meta-Learning

  • Lu Liu
  • Tianyi Zhou
  • Guodong Long
  • Jing Jiang
  • Chengqi Zhang

Meta-learning extracts the common knowledge from learning different tasks and uses it for unseen tasks. It can significantly improve tasks that suffer from insufficient training data, e. g. , few-shot learning. In most meta-learning methods, tasks are implicitly related by sharing parameters or optimizer. In this paper, we show that a meta-learner that explicitly relates tasks on a graph describing the relations of their output dimensions (e. g. , classes) can significantly improve few-shot learning. The graph’s structure is usually free or cheap to obtain but has rarely been explored in previous works. We develop a novel meta-learner of this type for prototype based classification, in which a prototype is generated for each class, such that the nearest neighbor search among the prototypes produces an accurate classification. The meta-learner, called “Gated Propagation Network (GPN)”, learns to propagate messages between prototypes of different classes on the graph, so that learning the prototype of each class benefits from the data of other related classes. In GPN, an attention mechanism aggregates messages from neighboring classes of each class, with a gate choosing between the aggregated message and the message from the class itself. We train GPN on a sequence of tasks from many-shot to few-shot generated by subgraph sampling. During training, it is able to reuse and update previously achieved prototypes from the memory in a life-long learning cycle. In experiments, under different training-test discrepancy and test task generation settings, GPN outperforms recent meta-learning methods on two benchmark datasets. Code of GPN is publicly available at: https: //github. com/liulu112601/Gated-Propagation-Net.

IJCAI Conference 2019 Conference Paper

Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph

  • Lu Liu
  • Tianyi Zhou
  • Guodong Long
  • Jing Jiang
  • Lina Yao
  • Chengqi Zhang

A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data.

AAAI Conference 2018 Conference Paper

Attention-based Belief or Disbelief Feature Extraction for Dependency Parsing

  • Haoyuan Peng
  • Lu Liu
  • Yi Zhou
  • Junying Zhou
  • Xiaoqing Zheng

Existing neural dependency parsers usually encode each word in a sentence with bi-directional LSTMs, and estimate the score of an arc from the LSTM representations of the head and the modifier, possibly missing relevant context information for the arc being considered. In this study, we propose a neural feature extraction method that learns to extract arcspecific features. We apply a neural network-based attention method to collect evidences for and against each possible head-modifier pair, with which our model computes certainty scores of belief and disbelief, and determines the final arc score by subtracting the score of disbelief from the one of belief. By explicitly introducing two kinds of evidences, the arc candidates can compete against each other based on more relevant information, especially for the cases where they share the same head or modifier. It makes possible to better discriminate two or more competing arcs by presenting their rivals (disbelief evidence). Experiments on various datasets show that our arc-specific feature extraction mechanism significantly improves the performance of bi-directional LSTMbased models by explicitly modeling long-distance dependencies. For both English and Chinese, the proposed model achieve a higher accuracy on dependency parsing task than most existing neural attention-based models.

AAAI Conference 2018 Conference Paper

RNN-Based Sequence-Preserved Attention for Dependency Parsing

  • Yi Zhou
  • Junying Zhou
  • Lu Liu
  • Jiangtao Feng
  • Haoyuan Peng
  • Xiaoqing Zheng

Recurrent neural networks (RNN) combined with attention mechanism has proved to be useful for various NLP tasks including machine translation, sequence labeling and syntactic parsing. The attention mechanism is usually applied by estimating the weights (or importance) of inputs and taking the weighted sum of inputs as derived features. Although such features have demonstrated their effectiveness, they may fail to capture the sequence information due to the simple weighted sum being used to produce them. The order of the words does matter to the meaning or the structure of the sentences, especially for syntactic parsing, which aims to recover the structure from a sequence of words. In this study, we propose an RNN-based attention to capture the relevant and sequence-preserved features from a sentence, and use the derived features to perform the dependency parsing. We evaluated the graph-based and transition-based parsing models enhanced with the RNN-based sequence-preserved attention on the both English PTB and Chinese CTB datasets. The experimental results show that the enhanced systems were improved with significant increase in parsing accuracy.

AAAI Conference 2014 Conference Paper

Forecasting Potential Diabetes Complications

  • Yang Yang
  • Walter Luyten
  • Lu Liu
  • Marie-Francine Moens
  • Jie Tang
  • Juanzi Li

Diabetes complications often afflict diabetes patients seriously: over 68% of diabetes-related mortality is caused by diabetes complications. In this paper, we study the problem of automatically diagnosing diabetes complications from patients’ lab test results. The objective problem has two main challenges: 1) feature sparseness: a patient only undergoes 1. 26% lab tests on average, and 65. 5% types of lab tests are performed on samples from less than 10 patients; 2) knowledge skewness: it lacks comprehensive detailed domain knowledge of the association between diabetes complications and lab tests. To address these challenges, we propose a novel probabilistic model called Sparse Factor Graph Model (SparseFGM). SparseFGM projects sparse features onto a lower-dimensional latent space, which alleviates the problem of sparseness. SparseFGM is also able to capture the associations between complications and lab tests, which help handle the knowledge skewness. We evaluate the proposed model on a large collections of real medical records. SparseFGM outperforms (+20% by F1) baselines significantly and gives detailed associations between diabetes complications and lab tests.