Arrow Research search

Author name cluster

Ning Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

33 papers
2 author rows

Possible papers

33

AAAI Conference 2026 Conference Paper

LoGoSeg: Integrating Local and Global Features for Open-Vocabulary Semantic Segmentation

  • Junyang Chen
  • Xiangbo Lv
  • Zhiqiang Kou
  • Xingdong Sheng
  • Ning Xu
  • Yiguo Qiao

Open-vocabulary semantic segmentation (OVSS) extends traditional closed-set segmentation by enabling pixel-wise annotation for both seen and unseen categories using arbitrary textual descriptions. While existing methods leverage vision-language models (VLMs) like CLIP, their reliance on image-level pretraining often results in imprecise spatial alignment, leading to mismatched segmentations in ambiguous or cluttered scenes. However, most existing approaches lack strong object priors and region-level constraints, which can lead to object hallucination or missed detections, further degrading performance. To address these challenges, we propose LoGoSeg, an efficient single-stage framework that integrates three key innovations: (i) an object existence prior that dynamically weights relevant categories through global image-text similarity, effectively reducing hallucinations; (ii) a region-aware alignment module that establishes precise region-level visual-textual correspondences; and (iii) a dual-stream fusion mechanism that optimally combines local structural information with global semantic context. Unlike prior works, LoGoSeg eliminates the need for external mask proposals, additional backbones, or extra datasets, ensuring efficiency. Extensive experiments on six benchmarks (A-847, PC-459, A-150, PC-59, PAS-20, and PAS-20b) demonstrate its competitive performance and strong generalization in open-vocabulary settings.

JBHI Journal 2026 Journal Article

SpineVLM: A Markdown-Guided Structured Fine-Tuning Framework for Spine X-ray Report Generation

  • Dong Liu
  • Wenhui Li
  • Ning Xu
  • Guoge Han
  • Rui Hao
  • Xianzhu Liu
  • An-An Liu

Automated medical report generation in specialized fields like spine radiography is constrained by data scarcity and high annotation costs. Consequently, existing multimodal large language models (MLLMs) struggle in these settings, often missing minute, scattered spinal abnormalities. We introduce SpineVLM, a data-efficient framework for structured spine X-ray report generation. The framework is built upon the newly constructed SXRG dataset, comprising 10, 468 image-report pairs developed via a hierarchical AI-assisted annotation pipeline. To optimize learning under limited data, we propose Markdown-Guided Structured Learning (MGSL), which reformulates unconstrained free-text synthesis into a structured completion task, acting as a strong regularizer. Furthermore, an unsupervised Region-Focused Inference (RFI) module powered by foundation models (DINOv2) isolates the vertebral column to enhance the perception of subtle lesions without requiring manual spatial annotations. Evaluated on a 7B-parameter vision-language backbone, SpineVLM achieves strong performance against ten baseline multimodal models across standard linguistic metrics. In a double-blind reader study, the system achieved a diagnostic F1-score of 0. 866, comparable to specialist performance, while reducing clinical reporting time by over 41%. By open-sourcing the dataset and codebase, we provide, to our knowledge, the first quantitative benchmark for automated spine radiography report generation, together with a structured framework for this data-limited setting. All data and code will be publicly released at https://github.com/LiuDongDaniel/SpineVLM.

JBHI Journal 2025 Journal Article

Accurate Multi-Landmark Localization in 3D Ultra-High Resolution CT Images of the Ears Via Deep Reinforcement Learning and Transformer

  • Zhiwei Qu
  • Li Zhuo
  • Ning Xu
  • Hongxia Yin
  • Zhenchang Wang
  • Xiaoguang Li

Automated landmark localization can help radiologists quickly determine the locations of key structures or lesion areas from medical images. However, when facing large-volume 3D medical images, existing methods have very high computational complexity due to the need to encode the global image. That is to say, it is difficult for existing methods to achieve accurate landmark localization in 3D medical images at a faster localization speed. In this paper, an accurate multi-landmark localization method for ear 3D Ultra-High Resolution CT (U-HRCT) images is proposed. This method adopts a novel localization pipeline that combines Deep Reinforcement Learning (DRL) and Transformer. Firstly, the DRL algorithm is used to quickly collect landmark-related local features. Secondly, Transformer is used to extract the spatial position relationship between anatomical structures from these discrete local features to infer the coordinate position of the landmark. Because the complex process of encoding the global image is avoided, the proposed method can achieve fast localization of ear multi-landmark in 3D U-HRCT images. Finally, we proposed a refinement module based on dual-branch hybrid Multi-Layer Perceptron, which can use the fast localization results of multi-landmark to learn the spatial position relationship between landmarks, thereby further improving the accuracy and stability of landmark localization. Experimental results on the self-built ear 3D U-HRCT dataset and the publicly available 2D cephalometric dataset demonstrate that, the proposed method can achieve Successful Detection Rate of 96. 71% and 89. 97% respectively within the precision range of 2. 0 mm, surpassing the state-of-the-art multi-landmark localization methods and has a faster localization speed.

NeurIPS Conference 2025 Conference Paper

Bi-Level Knowledge Transfer for Multi-Task Multi-Agent Reinforcement Learning

  • Junkai Zhang
  • Jinmin He
  • Yifan Zhang
  • Yifan Zang
  • Ning Xu
  • Jian Cheng

Multi-Agent Reinforcement Learning (MARL) has achieved remarkable success in various real-world scenarios, but its high cost of online training makes it impractical to learn each task from scratch. To enable effective policy reuse, we consider the problem of zero-shot generalization from offline data across multiple tasks. While prior work focuses on transferring individual skills of agents, we argue that the effective policy transfer across tasks should also capture the team-level coordination knowledge. In this paper, we propose Bi-Level Knowledge Transfer (BiKT) for Multi-Task MARL, which performs knowledge transfer at both the individual and team levels. At the individual level, we extract transferable individual skill embeddings from offline MARL trajectories. At the team level, we define tactics as coordinated patterns of skill combinations and capture them by leveraging the learned skill embeddings. We map skill combinations into compact tactic embeddings and then construct a tactic codebook. To incorporate both skills and tactics into decision-making, we design a bi-level decision transformer that infers them in sequence. Our BiKT leverages both the generalizability of individual skills and the diversity of tactics, enabling the learned policy to perform effectively across multiple tasks. Extensive experiments on SMAC and MPE benchmarks demonstrate that BiKT achieves strong generalization to previously unseen tasks.

NeurIPS Conference 2025 Conference Paper

Can Class-Priors Help Single-Positive Multi-Label Learning?

  • Biao Liu
  • Ning Xu
  • Jie Wang
  • Xin Geng

Single-positive multi-label learning (SPMLL) is a weakly supervised multi-label learning problem, where each training example is annotated with only one positive label. Existing SPMLL methods typically assign pseudo-labels to unannotated labels with the assumption that prior probabilities of all classes are identical. However, the class-prior of each category may differ significantly in real-world scenarios, which makes the predictive model not perform as well as expected due to the unrealistic assumption on real-world application. To alleviate this issue, a novel framework named Crisp, i. e. , Class-pRiors Induced Single-Positive multi-label learning, is proposed. Specifically, a class-priors estimator is introduced, which can estimate the class-priors that are theoretically guaranteed to converge to the ground-truth class-priors. In addition, based on the estimated class-priors, an unbiased risk estimator for classification is derived, and the corresponding risk minimizer can be guaranteed to approximately converge to the optimal risk minimizer on fully supervised data. Experimental results on ten MLL benchmark datasets demonstrate the effectiveness and superiority of our method over existing SPMLL approaches.

NeurIPS Conference 2025 Conference Paper

Reduction-based Pseudo-label Generation for Instance-dependent Partial Label Learning

  • Congyu Qiao
  • Ning Xu
  • Yihao Hu
  • Xin Geng

Instance-dependent Partial Label Learning (ID-PLL) aims to learn a multi-class predictive model given training instances annotated with candidate labels related to features, among which correct labels are hidden fixed but unknown. The previous works involve leveraging the identification capability of the training model itself to iteratively refine supervision information. However, these methods overlook a critical aspect of ID-PLL: within the original label space, the model may fail to distinguish some incorrect candidate labels that are strongly correlated with features from correct labels. This leads to poor-quality supervision signals and creates a bottleneck in the training process. In this paper, we propose to leverage reduction-based pseudo-labels to alleviate the influence of incorrect candidate labels and train our predictive model to overcome this bottleneck. Specifically, reduction-based pseudo-labels are generated by performing weighted aggregation on the outputs of a multi-branch auxiliary model, with each branch trained in a label subspace that excludes certain labels. This approach ensures that each branch explicitly avoids the disturbance of the excluded labels, allowing the pseudo-labels provided for instances troubled by these excluded labels to benefit from the unaffected branches. Theoretically, we demonstrate that reduction-based pseudo-labels exhibit greater consistency with the Bayes optimal classifier compared to pseudo-labels directly generated from the training predictive model.

NeurIPS Conference 2025 Conference Paper

Uncertain Knowledge Graph Completion via Semi-Supervised Confidence Distribution Learning

  • Tianxing Wu
  • Shutong Zhu
  • Jingting Wang
  • Ning Xu
  • Guilin Qi
  • Haofen Wang

Uncertain knowledge graphs (UKGs) associate each triple with a confidence score to provide more precise knowledge representations. Recently, since real-world UKGs suffer from the incompleteness, uncertain knowledge graph (UKG) completion attracts more attention, aiming to complete missing triples and confidences. Current studies attempt to learn UKG embeddings to solve this problem, but they neglect the extremely imbalanced distributions of triple confidences. This causes that the learnt embeddings are insufficient to high-quality UKG completion. Thus, in this paper, to address the above issue, we propose a new semi-supervised Confidence Distribution Learning (ssCDL) method for UKG completion, where each triple confidence is transformed into a confidence distribution to introduce more supervision information of different confidences to reinforce the embedding learning process. ssCDL iteratively learns UKG embedding by relational learning on labeled data (i. e. , existing triples with confidences) and unlabeled data with pseudo labels (i. e. , unseen triples with the generated confidences), which are predicted by meta-learning to augment the training data and rebalance the distribution of triple confidences. Experiments on two UKG datasets demonstrate that ssCDL consistently outperforms the state-of-the-art baselines in different evaluation metrics.

NeurIPS Conference 2024 Conference Paper

What Makes Partial-Label Learning Algorithms Effective?

  • Jiaqi Lv
  • Yangfan Liu
  • Shiyu Xia
  • Ning Xu
  • Miao Xu
  • Gang Niu
  • Min-Ling Zhang
  • Masashi Sugiyama

A partial label (PL) specifies a set of candidate labels for an instance and partial-label learning (PLL) trains multi-class classifiers with PLs. Recently, many methods that incorporate techniques from other domains have shown strong potential. The expectation that stronger techniques would enhance performance has resulted in prominent PLL methods becoming not only highly complicated but also quite different from one another, making it challenging to choose the best direction for future algorithm design. While it is exciting to see higher performance, this leaves open a fundamental question: what makes a PLL method effective? We present a comprehensive empirical analysis of this question and summarize the success of PLL so far into some minimal algorithm design principles. Our findings reveal that high accuracy on benchmark-simulated datasets with PLs can misleadingly amplify the perceived effectiveness of some general techniques, which may improve representation learning but have limited impact on addressing the inherent challenges of PLs. We further identify the common behavior among successful PLL methods as a progressive transition from uniform to one-hot pseudo-labels, highlighting the critical role of mini-batch PL purification in achieving top performance. Based on our findings, we introduce a minimal working algorithm that is surprisingly simple yet effective, and propose an improved strategy to implement the design principles, suggesting a promising direction for improvements in PLL.

AAAI Conference 2023 Conference Paper

FanoutNet: A Neuralized PCB Fanout Automation Method Using Deep Reinforcement Learning

  • Haiyun Li
  • Jixin Zhang
  • Ning Xu
  • Mingyu Liu

In modern electronic manufacturing processes, multi-layer Printed Circuit Board (PCB) routing requires connecting more than hundreds of nets with perplexing topology under complex routing constraints and highly limited resources, so that takes intense effort and time of human engineers. PCB fanout as a pre-design of PCB routing has been proved to be an ideal technique to reduce the complexity of PCB routing by pre-allocating resources and pre-routing. However, current PCB fanout design heavily relies on the experience of human engineers, and there is no existing solution for PCB fanout automation in industry, which limits the quality of PCB routing automation. To address the problem, we propose a neuralized PCB fanout method by deep reinforcement learning. To the best of our knowledge, we are the first in the literature to propose the automation method for PCB fanout. We combine with Convolution Neural Network (CNN) and attention-based network to train our fanout policy model and value model. The models learn representations of PCB layout and netlist to make decisions and evaluations in place of human engineers. We employ Proximal Policy Optimization (PPO) to update the parameters of the models. In addition, we apply our PCB fanout method to a PCB router to improve the quality of PCB routing. Extensive experimental results on real-world industrial PCB benchmarks demonstrate that our approach achieves 100% routability in all industrial cases and improves wire length by an average of 6.8%, which makes a significant improvement compared with the state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Imbalanced Label Distribution Learning

  • Xingyu Zhao
  • Yuexuan An
  • Ning Xu
  • Jing Wang
  • Xin Geng

Label distribution covers a certain number of labels, representing the degree to which each label describes an instance. The learning process on the instances labeled by label distributions is called Label Distribution Learning (LDL). Although LDL has been applied successfully to many practical applications, one problem with existing LDL methods is that they are limited to data with balanced label information. However, annotation information in real-world data often exhibits imbalanced distributions, which significantly degrades the performance of existing methods. In this paper, we investigate the Imbalanced Label Distribution Learning (ILDL) problem. To handle this challenging problem, we delve into the characteristics of ILDL and empirically find that the representation distribution shift is the underlying reason for the performance degradation of existing methods. Inspired by this finding, we present a novel method named Representation Distribution Alignment (RDA). RDA aligns the distributions of feature representations and label representations to alleviate the impact of the distribution gap between the training set and the test set caused by the imbalance issue. Extensive experiments verify the superior performance of RDA. Our work fills the gap in benchmarks and techniques for practical ILDL problems.

NeurIPS Conference 2023 Conference Paper

Learning From Biased Soft Labels

  • Hua Yuan
  • Yu Shi
  • Ning Xu
  • Xu Yang
  • Xin Geng
  • Yong Rui

Since the advent of knowledge distillation, many researchers have been intrigued by the $\textit{dark knowledge}$ hidden in the soft labels generated by the teacher model. This prompts us to scrutinize the circumstances under which these soft labels are effective. Predominant existing theories implicitly require that the soft labels are close to the ground-truth labels. In this paper, however, we investigate whether biased soft labels are still effective. Here, bias refers to the discrepancy between the soft labels and the ground-truth labels. We present two indicators to measure the effectiveness of the soft labels. Based on the two indicators, we propose moderate conditions to ensure that, the biased soft label learning problem is both $\textit{classifier-consistent}$ and $\textit{Empirical Risk Minimization}$ (ERM) $\textit{learnable}$, which can be applicable even for large-biased soft labels. We further design a heuristic method to train Skillful but Bad Teachers (SBTs), and these teachers with accuracy less than 30\% can teach students to achieve accuracy over 90\% on CIFAR-10, which is comparable to models trained on the original data. The proposed indicators adequately measure the effectiveness of the soft labels generated in this process. Moreover, our theoretical framework can be adapted to elucidate the effectiveness of soft labels in three weakly-supervised learning paradigms, namely incomplete supervision, partial label learning and learning with additive noise. Experimental results demonstrate that our indicators can measure the effectiveness of biased soft labels generated by teachers or in these weakly-supervised learning paradigms.

IJCAI Conference 2023 Conference Paper

Unreliable Partial Label Learning with Recursive Separation

  • Yu Shi
  • Ning Xu
  • Hua Yuan
  • Xin Geng

Partial label learning (PLL) is a typical weakly supervised learning problem in which each instance is associated with a candidate label set, and among which only one is true. However, the assumption that the ground-truth label is always among the candidate label set would be unrealistic, as the reliability of the candidate label sets in real-world applications cannot be guaranteed by annotators. Therefore, a generalized PLL named Unreliable Partial Label Learning (UPLL) is proposed, in which the true label may not be in the candidate label set. Due to the challenges posed by unreliable labeling, previous PLL methods will experience a marked decline in performance when applied to UPLL. To address the issue, we propose a two-stage framework named Unreliable Partial Label Learning with Recursive Separation (UPLLRS). In the first stage, the self-adaptive recursive separation strategy is proposed to separate the training set into a reliable subset and an unreliable subset. In the second stage, a disambiguation strategy is employed to progressively identify the ground-truth labels in the reliable subset. Simultaneously, semi-supervised learning methods are adopted to extract valuable information from the unreliable subset. Our method demonstrates state-of-the-art performance as evidenced by experimental results, particularly in situations of high unreliability. Code and supplementary materials are available at https: //github. com/dhiyu/UPLLRS.

IJCAI Conference 2022 Conference Paper

Ambiguity-Induced Contrastive Learning for Instance-Dependent Partial Label Learning

  • Shi-Yu Xia
  • Jiaqi Lv
  • Ning Xu
  • Xin Geng

Partial label learning (PLL) learns from a typical weak supervision, where each training instance is labeled with a set of ambiguous candidate labels (CLs) instead of its exact ground-truth label. Most existing PLL works directly eliminate, rather than exploiting the label ambiguity, since they explicitly or implicitly assume that incorrect CLs are noise independent of the instance. While a more practical setting in the wild should be instance-dependent, namely, the CLs depend on both the true label and the instance itself, such that each CL may describe the instance from some sensory channel, thereby providing some noisy but really valid information about the instance. In this paper, we leverage such additional information acquired from the ambiguity and propose AmBiguity-induced contrastive LEarning (ABLE) under the framework of contrastive learning. Specifically, for each CL of an anchor, we select a group of samples currently predicted as that class as ambiguity-induced positives, based on which we synchronously learn a representor (RP) that minimizes the weighted sum of contrastive losses of all groups and a classifier (CS) that minimizes a classification loss. Although they are circularly dependent: RP requires the ambiguity-induced positives on-the-fly induced by CS, and CS needs the first half of RP as the representation extractor, ABLE still enables RP and CS to be trained simultaneously within a coherent framework. Experiments on benchmark datasets demonstrate its substantial improvements over state-of-the-art methods for learning from the instance-dependent partially labeled data.

IJCAI Conference 2022 Conference Paper

Fusion Label Enhancement for Multi-Label Learning

  • Xingyu Zhao
  • Yuexuan An
  • Ning Xu
  • Xin Geng

Multi-label learning (MLL) refers to the problem of tagging a given instance with a set of relevant labels. In MLL, the implicit relative importance of different labels representing a single instance is generally different, which recently gained considerable attention and should be fully leveraged. Therefore, label enhancement (LE) has been widely applied in various MLL tasks as the ability to effectively mine the implicit relative importance information of different labels. However, due to the fact that the label enhancement process in previous LE-based MLL methods is decoupled from the training process on the predictive models, the objective of LE does not match the training process and finally affects the whole learning system. In this paper, we propose a novel approach named Fusion Label Enhancement for Multi-label learning (FLEM) to effectively integrate the LE process and the training process. Specifically, we design a matching and interaction mechanism which leverages a novel interaction label enhancement loss to avoid that the recovered label distribution does not match the need of the predictive model. In the meantime, we present a unified label distribution loss for establishing the corresponding relationship between the recovered label distribution and the training of the predictive model. With the proposed loss, the label distributions recovered from the LE process can be efficiently utilized for training the predictive model. Experimental results on multiple benchmark datasets validate the effectiveness of the proposed approach.

AAAI Conference 2022 Conference Paper

Hierarchical Image Generation via Transformer-Based Sequential Patch Selection

  • Xiaogang Xu
  • Ning Xu

To synthesize images with preferred objects and interactions, a controllable way is to generate the image from a scene graph and a large pool of object crops, where the spatial arrangements of the objects in the image are defined by the scene graph while their appearances are determined by the retrieved crops from the pool. In this paper, we propose a novel framework with such a semi-parametric generation strategy. First, to encourage the retrieval of mutually compatible crops, we design a sequential selection strategy where the crop selection for each object is determined by the contents and locations of all object crops that have been chosen previously. Such process is implemented via a transformer trained with contrastive losses. Second, to generate the final image, our hierarchical generation strategy leverages hierarchical gated convolutions which are employed to synthesize areas not covered by any image crops, and a patch-guided spatially adaptive normalization module which is proposed to guarantee the final generated images complying with the crop appearance and the scene graph. Evaluated on the challenging Visual Genome and COCO-Stuff dataset, our experimental results demonstrate the superiority of our proposed method over existing state-of-the-art methods.

AAAI Conference 2022 Conference Paper

Learngene: From Open-World to Your Learning Task

  • Qiu-Feng Wang
  • Xin Geng
  • Shu-Xia Lin
  • Shi-Yu Xia
  • Lei Qi
  • Ning Xu

Although deep learning has made significant progress on fixed large-scale datasets, it typically encounters challenges regarding improperly detecting unknown/unseen classes in the open-world scenario, over-parametrized, and overfitting small samples. Since biological systems can overcome the above difficulties very well, individuals inherit an innate gene from collective creatures that have evolved over hundreds of millions of years and then learn new skills through few examples. Inspired by this, we propose a practical collectiveindividual paradigm where an evolution (expandable) network is trained on sequential tasks and then recognize unknown classes in real-world. Moreover, the learngene, i. e. , the gene for learning initialization rules of the target model, is proposed to inherit the meta-knowledge from the collective model and reconstruct a lightweight individual model on the target task. Particularly, a novel criterion is proposed to discover learngene in the collective model, according to the gradient information. Finally, the individual model is trained only with few samples on the target learning tasks. We demonstrate the effectiveness of our approach in an extensive empirical study and theoretical analysis.

NeurIPS Conference 2022 Conference Paper

One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement

  • Ning Xu
  • Congyu Qiao
  • Jiaqi Lv
  • Xin Geng
  • Min-Ling Zhang

Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label and show that one can successfully learn a theoretically grounded multi-label classifier for the problem. In this paper, a novel SPMLL method named SMILE, i. e. , Single-positive MultI-label learning with Label Enhancement, is proposed. Specifically, an unbiased risk estimator is derived, which could be guaranteed to approximately converge to the optimal risk minimizer of fully supervised learning and shows that one positive label of each instance is sufficient to train the predictive model. Then, the corresponding empirical risk estimator is established via recovering the latent soft label as a label enhancement process, where the posterior density of the latent soft labels is approximate to the variational Beta density parameterized by an inference model. Experiments on benchmark datasets validate the effectiveness of the proposed method.

AAAI Conference 2021 Conference Paper

High-Resolution Deep Image Matting

  • Haichao Yu
  • Ning Xu
  • Zilong Huang
  • Yuqian Zhou
  • Humphrey Shi

Image matting is a key technique for image and video editing and composition. Conventionally, deep learning approaches take the whole input image and an associated trimap to infer the alpha matte using convolutional neural networks. Such approaches set state-of-the-arts in image matting; however, they may fail in real-world matting applications due to hardware limitations, since real-world input images for matting are mostly of very high resolution. In this paper, we propose HDMatt, a first deep learning based image matting approach for high-resolution inputs. More concretely, HDMatt runs matting in a patch-based crop-and-stitch manner for high-resolution inputs with a novel module design to address the contextual dependency and consistency issues between different patches. Compared with vanilla patch-based inference which computes each patch independently, we explicitly model the cross-patch contextual dependency with a newlyproposed Cross-Patch Contextual module (CPC) guided by the given trimap. Extensive experiments demonstrate the effectiveness of the proposed method and its necessity for highresolution inputs. Our HDMatt approach also sets new stateof-the-art performance on Adobe Image Matting and AlphaMatting benchmarks and produce impressive visual results on more real-world high-resolution images.

NeurIPS Conference 2021 Conference Paper

Instance-Dependent Partial Label Learning

  • Ning Xu
  • Congyu Qiao
  • Xin Geng
  • Min-Ling Zhang

Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels. However, this assumption is not realistic since the candidate labels are always instance-dependent. In this paper, we consider instance-dependent PLL and assume that each example is associated with a latent label distribution constituted by the real number of each label, representing the degree to each label describing the feature. The incorrect label with a high degree is more likely to be annotated as the candidate label. Therefore, the latent label distribution is the essential labeling information in partially labeled examples and worth being leveraged for predictive model training. Motivated by this consideration, we propose a novel PLL method that recovers the label distribution as a label enhancement (LE) process and trains the predictive model iteratively in every epoch. Specifically, we assume the true posterior density of the latent label distribution takes on the variational approximate Dirichlet density parameterized by an inference model. Then the evidence lower bound is deduced for optimizing the inference model and the label distributions generated from the variational posterior are utilized for training the predictive model. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed method. Source code is available at https: //github. com/palm-ml/valen.

IJCAI Conference 2021 Conference Paper

Video Summarization via Label Distributions Dual-Reward

  • Yongbiao Gao
  • Ning Xu
  • Xin Geng

Reinforcement learning maps from perceived state representation to actions, which is adopted to solve the video summarization problem. The reward is crucial for deal with the video summarization task via reinforcement learning, since the reward signal defines the goal of video summarization. However, existing reward mechanism in reinforcement learning cannot handle the ambiguity which appears frequently in video summarization, i. e. , the diverse consciousness by different people on the same video. To solve this problem, in this paper label distributions are mapped from the CNN and LSTM-based state representation to capture the subjectiveness of video summaries. The dual-reward is designed by measuring the similarity between user score distributions and the generated label distributions. Not only the average score but also the the variance of the subjective opinions are considered in summary generation. Experimental results on several benchmark datasets show that our proposed method outperforms other approaches under various settings.

AAAI Conference 2020 Conference Paper

Channel Attention Is All You Need for Video Frame Interpolation

  • Myungsub Choi
  • Heewon Kim
  • Bohyung Han
  • Ning Xu
  • Kyoung Mu Lee

Prevailing video frame interpolation techniques rely heavily on optical flow estimation and require additional model complexity and computational cost; it is also susceptible to error propagation in challenging scenarios with large motion and heavy occlusion. To alleviate the limitation, we propose a simple but effective deep neural network for video frame interpolation, which is end-to-end trainable and is free from a motion estimation network component. Our algorithm employs a special feature reshaping operation, referred to as PixelShuf- fle, with a channel attention, which replaces the optical flow computation module. The main idea behind the design is to distribute the information in a feature map into multiple channels and extract motion information by attending the channels for pixel-level frame synthesis. The model given by this principle turns out to be effective in the presence of challenging motion and occlusion. We construct a comprehensive evaluation benchmark and demonstrate that the proposed approach achieves outstanding performance compared to the existing models with a component for optical flow computation.

NeurIPS Conference 2020 Conference Paper

Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation

  • Yuxi Li
  • Ning Xu
  • Jinlong Peng
  • John See
  • Weiyao Lin

In this paper, we take attempt to incorporate the cyclic mechanism with the vision task of semi-supervised video object segmentation. By resorting to the accurate reference mask of the first frame, we try to mitigate the error propagation problem in most of current video object segmentation pipelines. Firstly, we propose a cyclic scheme for offline training of segmentation networks. Then, we extend the offline pipeline to an online method by introducing a simple gradient correction module while keeping high efficiency as other offline methods. Finally we develop cycle effective receptive field (cycle-ERF) from gradient correction to provide a new perspective for analyzing object-specific regions of interests. We conduct comprehensive experiments on benchmarks of DAVIS17 and Youtube-VOS, demonstrating that our introduced cyclic mechanism is helpful to boost the segmentation quality.

AAAI Conference 2020 Conference Paper

Finding Action Tubes with a Sparse-to-Dense Framework

  • Yuxi Li
  • Weiyao Lin
  • Tao Wang
  • John See
  • Rui Qian
  • Ning Xu
  • Limin Wang
  • Shugong Xu

The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7. 6 times more efficient than the nearest competitor.

IJCAI Conference 2020 Conference Paper

Label Distribution for Learning with Noisy Labels

  • Yun-Peng Liu
  • Ning Xu
  • Yu Zhang
  • Xin Geng

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.

ICLR Conference 2020 Conference Paper

Minimizing FLOPs to Learn Efficient Sparse Representations

  • Biswajit Paria
  • Chih-Kuan Yeh
  • Ian En-Hsu Yen
  • Ning Xu
  • Pradeep Ravikumar
  • Barnabás Póczos

Deep representation learning has become one of the most widely adopted approaches for visual search, recommendation, and identification. Retrieval of such representations from a large database is however computationally challenging. Approximate methods based on learning compact representations, have been widely explored for this problem, such as locality sensitive hashing, product quantization, and PCA. In this work, in contrast to learning compact representations, we propose to learn high dimensional and sparse representations that have similar representational capacity as dense embeddings while being more efficient due to sparse matrix multiplication operations which can be much faster than dense multiplication. Following the key insight that the number of operations decreases quadratically with the sparsity of embeddings provided the non-zero entries are distributed uniformly across dimensions, we propose a novel approach to learn such distributed sparse embeddings via the use of a carefully constructed regularization function that directly minimizes a continuous relaxation of the number of floating-point operations (FLOPs) incurred during retrieval. Our experiments show that our approach is competitive to the other baselines and yields a similar or better speed-vs-accuracy tradeoff on practical datasets.

AAAI Conference 2020 Conference Paper

Partial Multi-Label Learning with Label Distribution

  • Ning Xu
  • Yun-Peng Liu
  • Xin Geng

Partial multi-label learning (PML) aims to learn from training examples each associated with a set of candidate labels, among which only a subset are valid for the training example. The common strategy to induce predictive model is trying to disambiguate the candidate label set, such as identifying the ground-truth label via utilizing the confidence of each candidate label or estimating the noisy labels in the candidate label sets. Nonetheless, these strategies ignore considering the essential label distribution corresponding to each instance since the label distribution is not explicitly available in the training set. In this paper, a new partial multi-label learning strategy named PML-LD is proposed to learn from partial multi-label examples via label enhancement. Specifically, label distributions are recovered by leveraging the topological information of the feature space and the correlations among the labels. After that, a multi-class predictive model is learned by fitting a regularized multi-output regressor with the recovered label distributions. Experimental results on synthetic as well as real-world datasets clearly validate the effectiveness of PML- LD for solving PML problems.

IJCAI Conference 2020 Conference Paper

Video Question Answering on Screencast Tutorials

  • Wentian Zhao
  • Seokhwan Kim
  • Ning Xu
  • Hailin Jin

This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.

AAAI Conference 2019 Conference Paper

Partial Label Learning via Label Enhancement

  • Ning Xu
  • Jiaqi Lv
  • Xin Geng

Partial label learning aims to learn from training examples each associated with a set of candidate labels, among which only one label is valid for the training example. The common strategy to induce predictive model is trying to disambiguate the candidate label set, such as disambiguation by identifying the ground-truth label iteratively or disambiguation by treating each candidate label equally. Nonetheless, these strategies ignore considering the generalized label distribution corresponding to each instance since the generalized label distribution is not explicitly available in the training set. In this paper, a new partial label learning strategy named PL-LE is proposed to learn from partial label examples via label enhancement. Specifically, the generalized label distributions are recovered by leveraging the topological information of the feature space. After that, a multi-class predictive model is learned by fitting a regularized multi-output regressor with the generalized label distributions. Extensive experiments show that PL-LE performs favorably against state-ofthe-art partial label learning approaches.

IJCAI Conference 2019 Conference Paper

Weakly Supervised Multi-Label Learning via Label Enhancement

  • Jiaqi Lv
  • Ning Xu
  • RenYi Zheng
  • Xin Geng

Weakly supervised multi-label learning (WSML) concentrates on a more challenging multi-label classification problem, where some labels in the training set are missing. Existing approaches make multi-label prediction by exploiting the incomplete logical labels directly without considering the relative importance of each label to an instance. In this paper, a novel two-stage strategy named Weakly Supervised Multi-label Learning via Label Enhancement (WSMLLE) is proposed to learn from weakly supervised data via label enhancement. Firstly, the relative importance of each label, i. e. , the description degrees are recovered by leveraging the structural information in the feature space and local correlations learned from the label space. Then, a tailored multi-label predictive model is induced by learning from the training instances with the recovered description degrees. To our best knowledge, it is the first attempt to unify the complement of the missing labels and the recovery of the description degrees into the same framework. Extensive experiments across a wide range of real-world datasets clearly validate the superiority of the proposed approach.

IJCAI Conference 2018 Conference Paper

Label Enhancement for Label Distribution Learning

  • Ning Xu
  • An Tao
  • Xin Geng

Label distribution is more general than both single-label annotation and multi-label annotation. It covers a certain number of labels, representing the degree to which each label describes the instance. The learning process on the instances labeled by label distributions is called label distribution learning (LDL). Unfortunately, many training sets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. To solve the problem, one way is to recover the label distributions from the logical labels in the training set via leveraging the topological information of the feature space and the correlation among the labels. Such process of recovering label distributions from logical labels is defined as label enhancement (LE), which reinforces the supervision information in the training sets. This paper proposes a novel LE algorithm called Graph Laplacian Label Enhancement (GLLE). Experimental results on one artificial dataset and fourteen real-world datasets show clear advantages of GLLE over several existing LE algorithms.

IJCAI Conference 2018 Conference Paper

Multi-Level Policy and Reward Reinforcement Learning for Image Captioning

  • Anan Liu
  • Ning Xu
  • Hanwang Zhang
  • Weizhi Nie
  • Yuting Su
  • Yongdong Zhang

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flickr30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.

UAI Conference 2006 Conference Paper

Propagation of Delays in the National Airspace System

  • Kathryn B. Laskey
  • Ning Xu
  • Chun-Hung Chen

The National Airspace System (NAS) is a large and complex system with thousands of interrelated components: administration, control centers, airports, airlines, aircraft, passengers, etc. The complexity of the NAS creates many difficulties in management and control. One of the most pressing problems is flight delay. Delay creates high cost to airlines, complaints from passengers, and difficulties for airport operations. As demand on the system increases, the delay problem becomes more and more prominent. For this reason, it is essential for the Federal Aviation Administration to understand the causes of delay and to find ways to reduce delay. Major contributing factors to delay are congestion at the origin airport, weather, increasing demand, and air traffic management (ATM) decisions such as the Ground Delay Programs (GDP). Delay is an inherently stochastic phenomenon. Even if all known causal factors could be accounted for, macro-level national airspace system (NAS) delays could not be predicted with certainty from micro-level aircraft information. This paper presents a stochastic model that uses Bayesian Networks (BNs) to model the relationships among different components of aircraft delay and the causal factors that affect delays. A case study on delays of departure flights from Chicago O'Hare international airport (ORD) to Hartsfield-Jackson Atlanta International Airport (ATL) reveals how local and system level environmental and human-caused factors combine to affect components of delay, and how these components contribute to the final arrival delay at the destination airport.