Arrow Research search

Author name cluster

Jun Huang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers
2 author rows

Possible papers

28

AAAI Conference 2026 Conference Paper

Consensus-Driven Multi-Agent Cognitive Reasoning for Enhancing the Emotional Intelligence of Large Language Models

  • Geng Tu
  • Dingming Li
  • Jun Huang
  • Ruifeng Xu

Large Language Models (LLMs) have demonstrated strong performance in various NLP tasks but remain limited in emotional intelligence (EI). Benchmarks such as EmoBench attribute this gap to deficiencies in cognitively demanding tasks that require inferring others’ latent mental states, intentions, and emotions in nuanced social contexts. To address this, we propose MACRo, a Multi-Agent Cognitive Reasoning framework that generates a structured Cognitive Chain of Thought comprising Situation, Clue, Thought, Action, and Emotion. Each component is generated by a specialized agent, enabling modular, interpretable multi-step reasoning. To ensure coherence and mitigate hallucinations, a coordinator agent verifies outputs, and a consensus game mechanism enforces alignment across reasoning steps. Extensive Experiments on EmoBench show that MACRo significantly enhances both emotional understanding and application across LLMs. Further evaluations confirm its generalizability to real-world social applications such as emotional support conversations.

JBHI Journal 2026 Journal Article

OPDoctorNet: Deep Learning Revolutionizes Opportunistic Screening of Osteoporosis Based on Clinical Data

  • Qiankun Jin
  • Qiyu Jia
  • Xiaoxia Zhou
  • Dian Jin
  • Xuewei Song
  • Zhiyuan Xie
  • Abudusalamu Alimujiang
  • Yancheng Li

Osteoporosis poses a significant global public health challenge, and timely detection and treatment are crucial for preventing fragility fractures in the elderly. However, opportunistic screening remains challenging. Despite rapid deep learning development, its potential in clinical data classification has yet to be fully realized, with traditional machine learning dominating. Therefore, deepening research on deep learning for clinical data recognition in osteoporosis screening holds practical significance. This study utilizes the latest artificial intelligence technology to develop the OPDoctorNet algorithm, combining Transformer and Mamba feature extraction advantages, innovatively proposing multiscale feature fusion and the FeatureBake Block to deeply extract global and local features. The algorithm improves osteoporosis recognition accuracy in clinical data and meets multi-task needs. Results show OPDoctorNet significantly outperforms traditional machine learning and other AI methods in accuracy, recall, and F1 scores, with strong robustness and generalization. Through the Innovation of the FeatureBake Block, this study provides a groundbreaking solution for Transformer and Mamba feature processing, enabling efficient, accurate opportunistic osteoporosis screening. Additionally, using SHAP Plot and feature importance mapping for visual analysis enhances interpretability, offering new ideas and methods for osteoporosis screening in clinical practice, aiding accurate, scientific clinical decision-making and promoting deep learning application in clinical data classification.

AAAI Conference 2026 Conference Paper

Probabilistic Deformation Consistency for Unsupervised Shape Matching

  • Yifan Xia
  • Tianwei Ye
  • Jun Huang
  • Xiaoguang Mei
  • Jiayi Ma

In this paper, we propose a novel unsupervised shape matching framework based on probabilistic deformation consistency in the spectral domain, termed as PDCMatch. Axiomatic optimization methods suffer from expensive geodesic distance calculations and vulnerability to local optima, and learning-based methods typically lack geometric consistency in pointwise correspondences. To overcome both limitations, we develop a non-Euclidean probabilistic deformation model that jointly estimates the underlying deformation and the correspondence probability via a linear Expectation-Maximization procedure. Building on this formulation, we further design a task-specific deformation loss that explicitly encourages geometric smoothness and structural consistency in an unsupervised manner. This tailored loss function plays a central role in improving the matching performance across challenging scenarios. Extensive experiments on public benchmarks involving near-isometric shapes, anisotropic meshing, cross-dataset generalization, topological noise, and non-isometric shapes demonstrate that our method consistently outperforms state-of-the-art methods, highlighting both its effectiveness and generalizability.

AAAI Conference 2026 Conference Paper

Zero-to-Hero: Empowering Video Appearance Transfer with Zero-Shot Initialization and Holistic Restoration

  • Tongtong Su
  • Chengyu Wang
  • Haipeng Liao
  • Jun Huang
  • Dongming Lu

Appearance editing according to user needs is a pivotal task in video editing. Existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control over editing specific aspects of objects. To overcome these limitations, this paper introduces a novel approach named Zero-to-Hero, which focuses on reference-based video editing by disentangling the editing process into two distinct problems. It achieves this by first editing an anchor frame to satisfy user requirements as a reference image and then consistently propagating its appearance across the other frames in the video. To achieve accurate appearance propagation, in the first stage of Zero-to-Hero, we leverage correspondences within the original frames to guide the attention mechanism, which is more robust than previously proposed optical flow or temporal modules in memory-friendly video generative models, especially when dealing with objects exhibiting large motions. This offers a solid zero-shot initialization that ensures both accuracy and temporal consistency. However, intervention in the attention mechanism results in compounded imaging degradation with unknown blurring and color-missing issues. Following the Zero-Stage, our Hero-Stage holistically learns a conditional generative model for video restoration. To accurately evaluate appearance consistency, we construct a set of videos with multiple appearances using Blender, enabling a fine-grained and deterministic evaluation. Our method outperforms the best-performing baseline with a PSNR improvement of 2.6 dB.

IJCAI Conference 2025 Conference Paper

AdaptEdit: An Adaptive Correspondence Guidance Framework for Reference-Based Video Editing

  • Tongtong Su
  • Chengyu Wang
  • Bingyan Liu
  • Jun Huang
  • Dongming Lu

Video editing is a pivotal process for customizing video content according to user needs. However, existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control for editing specific aspects in videos. To overcome these limitations, this paper introduces a novel approach named \emph{AdaptEdit}, which focuses on reference-based video editing that disentangles the editing process. It achieves this by first editing a reference image and then adaptively propagating its appearance across other frames to complete the video editing. While previous propagation methods, such as optical flow and the temporal modules of recent video generative models, struggle with object deformations and large motions, we propose an adaptive correspondence strategy that accurately transfers the appearance from the reference frame to the target frames by leveraging inter-frame semantic correspondences in the original video. By implementing a proxy-editing task to optimize hyperparameters for image token-level correspondence, our method effectively balances the need to maintain the target frame's structure while preventing leakage of irrelevant appearance. To more accurately evaluate editing beyond the semantic-level consistency provided by CLIP-style models, we introduce a new dataset, PVA, which supports pixel-level evaluation. Our method outperforms the best-performing baseline with a clear PSNR improvement of 3. 6 dB.

NeurIPS Conference 2025 Conference Paper

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models

  • Daoyuan Chen
  • Yilun Huang
  • Xuchen Pan
  • Jiang Nana
  • Haibin Wang
  • Yilei Zhang
  • Ce Ge
  • Yushuo Chen

Foundation models demand advanced data processing for their vast, multimodal datasets. However, traditional frameworks struggle with the unique complexities of multimodal data. In response, we present Data-Juicer 2. 0, a data processing system backed by 100+ data processing operators spanning text, image, video, and audio modalities, supporting more critical tasks including data analysis, synthesis, annotation, and foundation model post-training. With seamless compatibility and dedicated optimization for popular dataset hubs like Hugging Face and computing engines like Ray, it improves upon its predecessor in terms of usability, efficiency, and programmability. It features an easily accessible user interface layer that supports decoupled Python interactions, RESTful APIs, and conversational commands. Its new runtime layer offers adaptive execution across diverse scales and environments, abstracting away system complexities. Extensive empirical evaluations demonstrate Data-Juicer 2. 0's remarkable performance and scalability, highlighting its capability to efficiently process TB-level data with 10k+ CPU cores. The system is publicly available and has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI. We actively maintain the system and share practical insights to foster research and applications of next-generation foundation models.

IJCAI Conference 2025 Conference Paper

Efficient Inter-Operator Scheduling for Concurrent Recommendation Model Inference on GPU

  • Shuxi Guo
  • Zikang Xu
  • Jiahao Liu
  • Jinyi Zhang
  • Qi Qi
  • Haifeng Sun
  • Jun Huang
  • Jianxin Liao

Deep learning-based recommendation systems are increasingly important in the industry. To meet strict SLA requirements, serving frameworks must efficiently handle concurrent queries. However, current serving systems fail to serve concurrent queries due to the following problems: (1) inefficient operator (op) scheduling due to the query-wise op launching mechanism, and (2) heavy contention caused by the mutable nature of recommendation model inference. This paper presents RecOS, a system designed to optimize concurrent recommendation model inference on GPUs. RecOS efficiently schedules ops from different queries by monitoring GPU workloads and assigning ops to the most suitable streams. This approach reduces contention and enhances inference efficiency by leveraging inter-op parallelism and op characteristics. To maintain correctness across multiple CUDA streams, RecOS introduces a unified asynchronous tensor management mechanism. Evaluations demonstrate that RecOS improves online service performance, reducing latency by up to 68%.

IJCAI Conference 2025 Conference Paper

FastBlend: Enhancing Video Stylization Consistency via Model-Free Patch Blending

  • Zhongjie Duan
  • Chengyu Wang
  • Cen Chen
  • Weining Qian
  • Jun Huang
  • Mingyi Jin

With the emergence of diffusion models and the rapid development of image processing, generating artistic images in style transfer tasks has become effortless. However, these impressive image processing approaches face consistency issues in video processing due to the independent processing of each frame. In this paper, we propose a powerful, model-free approach called FastBlend to address the consistency problem in video stylization. FastBlend functions as a post-processor and can be seamlessly integrated with diffusion models to create a robust video stylization pipeline. Based on a patch-matching algorithm, we remap and blend the aligned content across multiple frames, thus compensating for inconsistent content with neighboring frames. Moreover, we propose a tree-like data structure and a specialized loss function, aiming to optimize computational efficiency and visual quality for different application scenarios. Extensive experiments have demonstrated the effectiveness of FastBlend. Compared with both independent video deflickering algorithms and diffusion-based video processing methods, FastBlend is capable of synthesizing more coherent and realistic videos.

IJCAI Conference 2025 Conference Paper

Hallucination-Aware Prompt Optimization for Text-to-Video Synthesis

  • Jiapeng Wang
  • Chengyu Wang
  • Jun Huang
  • Lianwen Jin

The rapid advancements in AI-generated content (AIGC) have led to extensive research and application of deep text-to-video (T2V) synthesis models, such as OpenAI's Sora. These models typically rely on high-quality prompt-video pairs and detailed text prompts for model training in order to produce high-quality videos. To boost the effectiveness of Sora-like T2V models, we introduce VidPrompter, an innovative large multi-modal model supporting T2V applications with three key functionalities: (1) generating detailed prompts from raw videos, (2) enhancing prompts from videos grounded with short descriptions, and (3) refining simple user-provided prompts to elevate T2V video quality. We train VidPrompter using a hybrid multi-task paradigm and propose the hallucination-aware direct preference optimization (HDPO) technique to improve the multi-modal, multi-task prompt optimization process. Experiments on various tasks show our method surpasses strong baselines and other competitors.

IJCAI Conference 2025 Conference Paper

Multimodal Image Matching Based on Cross-Modality Completion Pre-training

  • Meng Yang
  • Fan Fan
  • Jun Huang
  • Yong Ma
  • Xiaoguang Mei
  • Zhanchuan Cai
  • Jiayi Ma

The differences in imaging devices cause multimodal images to have modal differences and geometric distortions, complicating the matching task. Deep learning-based matching methods struggle with multimodal images due to the lack of large annotated multimodal datasets. To address these challenges, we propose XCP-Match based on cross-modality completion pre-training. XCP-Match has two phases. (1) Self-supervised cross-modality completion pre-training based on real multimodal image dataset. We develop a novel pre-training model to learn cross-modal semantic features. The pre-training uses masked image modeling method for cross-modality completion, and introduces an attention-weighted contrastive loss to emphasize matching in overlapping areas. (2) Supervised fine-tuning for multimodal image matching based on the augmented MegaDepth dataset. XCP-Match constructs a complete matching framework to overcome geometric distortions and achieve precise matching. Two-phase training encourages the model to learn deep cross-modal semantic information, improving adaptation to modal differences without needing large annotated datasets. Experiments demonstrate that XCP-Match outperforms existing algorithms on public datasets.

IJCAI Conference 2024 Conference Paper

Cross-Scale Domain Adaptation with Comprehensive Information for Pansharpening

  • Meiqi Gong
  • Hao Zhang
  • Hebaixu Wang
  • Jun Chen
  • Jun Huang
  • Xin Tian
  • Jiayi Ma

Deep learning-based pansharpening methods typically use simulated data at the reduced-resolution scale for training. It limits their performance when generalizing the trained model to the full-resolution scale due to incomprehensive information utilization of panchromatic (PAN) images at the full-resolution scale and low generalization ability. In this paper, we adopt two targeted strategies to address the above two problems. On the one hand, we introduce a cross-scale comprehensive information capture module, which improves the information utilization of the original PAN image through fully-supervised reconstruction. On the other hand, we pioneer a domain adaptation strategy to tackle the problem of low generalization across different scales. Considering the instinct domain gap between different scales, we leverage the maximum mean discrepancy loss and the inherent pixel-level correlations between features at different scales to reduce the scale variance, thus boosting the generalization ability of our model. Experiments on various satellites demonstrate the superiority of our method over the state-of-the-arts in terms of information retention. Our code is publicly available at https: //github. com/Meiqi-Gong/SDIPS.

IJCAI Conference 2024 Conference Paper

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

  • Zhongjie Duan
  • Chengyu Wang
  • Cen Chen
  • Weining Qian
  • Jun Huang

Toon shading is a type of non-photorealistic rendering task in animation. Its primary purpose is to render objects with a flat and stylized appearance. As diffusion models have ascended to the forefront of image synthesis, this paper delves into an innovative form of toon shading based on diffusion models, aiming to directly render photorealistic videos into anime styles. In video stylization, existing methods encounter persistent challenges, notably in maintaining consistency and achieving high visual quality. In this paper, we model the toon shading problem as four subproblems, i. e. , stylization, consistency enhancement, structure guidance, and colorization. To address the challenges in video stylization, we propose an effective toon shading approach called Diffutoon. Diffutoon is capable of rendering remarkably detailed, high-resolution, and extended-duration videos in anime style. It can also edit the video content according to input prompts via an additional branch. The efficacy of Diffutoon is evaluated through quantitive metrics and human evaluation. Notably, Diffutoon surpasses both open-source and closed-source baseline approaches in our experiments. Our work is accompanied by the release of both the source code and example videos on Github.

AAAI Conference 2024 Conference Paper

M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis

  • Ning Zhang
  • Hiuyi Cheng
  • Jiayu Chen
  • Zongyuan Jiang
  • Jun Huang
  • Yang Xue
  • Lianwen Jin

Document layout analysis is a crucial step for intelligent document understanding. However, many existing methods primarily focus on the visual aspects and overlook the textual features of documents. Although document pre-trained models utilize multi-modal features during the pre-training phase, they tend to operate as a unimodal pipeline when it comes to layout analysis tasks. Furthermore, current multi-modal methods perform worse than unimodal detectors on complex layout analysis datasets. To address these limitations, we propose an effective and pluggable multi-modal fusion approach named M2Doc, which fuses visual and textual features for better layout detection. M2Doc contains two pluggable multi-modal fusion modules, early-fusion and late-fusion, which align and fuse visual and textual features at the pixel level and block level. Benefitting from the concision and effectiveness of M2Doc, it can be easily applied to various detectors for better layout detection, including two-stage and end-to-end object detectors. Our experimental results demonstrate significant performance improvements in detectors equipped with M2Doc on datasets such as DocLayNet (+11.3 mAP) and M6Doc (+1.9 mAP). Furthermore, through the integration of the DINO detector with M2Doc, we achieve state-of-the-art results on DocLayNet (89.0 mAP), M6Doc (69.9 mAP), and PubLayNet (95.5 mAP). The code will be publicly released at https://github.com/johnning2333/M2Doc.

AAAI Conference 2024 Conference Paper

M2SD:Multiple Mixing Self-Distillation for Few-Shot Class-Incremental Learning

  • Jinhao Lin
  • Ziheng Wu
  • Weifeng Lin
  • Jun Huang
  • RongHua Luo

Few-shot Class-incremental learning (FSCIL) is a challenging task in machine learning that aims to recognize new classes from a limited number of instances while preserving the ability to classify previously learned classes without retraining the entire model. This presents challenges in updating the model with new classes using limited training data, particularly in balancing acquiring new knowledge while retaining the old. We propose a novel method named Multiple Mxing Self-Distillation (M2SD) during the training phase to address these issues. Specifically, we propose a dual-branch structure that facilitates the expansion of the entire feature space to accommodate new classes. Furthermore, we introduce a feature enhancement component that can pass additional enhanced information back to the base network by self-distillation, resulting in improved classification performance upon adding new classes. After training, we discard both structures, leaving only the primary network to classify new class instances. Extensive experiments demonstrate that our approach achieves superior performance over previous state-of-the-art methods.

ICML Conference 2023 Conference Paper

SLAMB: Accelerated Large Batch Training with Sparse Communication

  • Hang Xu
  • Wenxuan Zhang 0001
  • Jiawei Fei
  • Yuzhe Wu
  • Tingwen Xie
  • Jun Huang
  • Yuchen Xie
  • Mohamed Elhoseiny

Distributed training of large deep neural networks requires frequent exchange of massive data between machines, thus communication efficiency is a major concern. Existing compressed communication methods are either not compatible with large batch optimization algorithms, or do not provide sufficient speedup in large scale. In this paper, we combine sparsification-based gradient compression with the layer-wise adaptive moments optimizer for large batch training (LAMB). We propose SLAMB, a novel communication-efficient optimizer that supports large batch sizes and scales to thousands of GPUs. SLAMB employs momentum masking, local error compensation, and element-wise adaptive rescaling to achieve accurate layer-wise weight updates, which translates to fast convergence for very large batches. Our empirical results show that, compared to the state-of-the-art, SLAMB transmits half the amount of data in large-batch BERT pre-training, without sacrificing accuracy. Moreover, SLAMB achieves excellent scalability in large computing infrastructures. For instance, SLAMB with 128 GPUs reduces the training time of Swin Transformer pre-training on ImageNet to 5. 35 hours, which is 2 hours faster than the state-of-the-art. At the extreme, we trained BERT-XL (2. 8B parameters) on 1, 024 NVIDIA A100 GPUs, where SLAMB achieved 90% scaling efficiency.

AAAI Conference 2023 Conference Paper

Uncertainty-Aware Self-Training for Low-Resource Neural Sequence Labeling

  • Jianing Wang
  • Chengyu Wang
  • Jun Huang
  • Ming Gao
  • Aoying Zhou

Neural sequence labeling (NSL) aims at assigning labels for input language tokens, which covers a broad range of applications, such as named entity recognition (NER) and slot filling, etc. However, the satisfying results achieved by traditional supervised-based approaches heavily depend on the large amounts of human annotation data, which may not be feasible in real-world scenarios due to data privacy and computation efficiency issues. This paper presents SeqUST, a novel uncertain-aware self-training framework for NSL to address the labeled data scarcity issue and to effectively utilize unlabeled data. Specifically, we incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data based on the model confidence and certainty. A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels. In addition, we develop a Gaussian-based consistency regularization technique to further improve the model robustness on Gaussian-distributed perturbed representations. This effectively alleviates the over-fitting dilemma originating from pseudo-labeled augmented data. Extensive experiments over six benchmarks demonstrate that our SeqUST framework effectively improves the performance of self-training, and consistently outperforms strong baselines by a large margin in low-resource scenarios.

AAAI Conference 2022 Conference Paper

DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding

  • Taolin Zhang
  • Chengyu Wang
  • Nan Hu
  • Minghui Qiu
  • Chengguang Tang
  • Xiaofeng He
  • Jun Huang

Knowledge-Enhanced Pre-trained Language Models (KE- PLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities. Experiments show that our model outperforms other KEPLMs significantly over zero-shot knowledge probing tasks and multiple knowledge-aware language understanding tasks. To guarantee effective knowledge injection, previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs. The operations for knowledge retrieval and encoding bring significant computational burdens, restricting the usage of such models in real-world applications that require high inference speed. In this paper, we propose a novel KEPLM named DKPLM that decomposes knowledge injection process of the pre-trained language models in pre-training, fine-tuning and inference stages, which facilitates the applications of KEPLMs in realworld scenarios. Specifically, we first detect knowledge-aware long-tail entities as the target for knowledge injection, enhancing the KEPLMs’ semantic understanding abilities and avoiding injecting redundant information. The embeddings of long-tail entities are replaced by “pseudo token representations” formed by relevant knowledge triples. We further design the relational knowledge decoding task for pre-training to force the models to truly understand the injected knowledge by relation triple reconstruction. Experiments show that our model outperforms other KEPLMs significantly over zeroshot knowledge probing tasks and multiple knowledge-aware language understanding tasks. We further show that DKPLM has a higher inference speed than other competing models due to the decomposing mechanism.

AAAI Conference 2022 Conference Paper

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

  • Runxin Xu
  • Fuli Luo
  • Chengyu Wang
  • Baobao Chang
  • Jun Huang
  • Songfang Huang
  • Fei Huang

Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pretrained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i. e. , the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i. e. , 97% sparsity), CAP successfully achieves 99. 2% and 96. 3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.

AAAI Conference 2021 System Paper

EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition

  • Chengyu Wang
  • Mengli Cheng
  • Xu Hu
  • Jun Huang

We present EasyASR, a distributed machine learning platform for training and serving large-scale Automatic Speech Recognition (ASR) models, as well as collecting and processing audio data at scale. Our platform is built upon the Machine Learning Platform for AI of Alibaba Cloud. Its main functionality is to support efficient learning and inference for end-to-end ASR models on distributed GPU clusters. It allows users to learn ASR models with either pre-defined or user-customized network architectures via simple user interface. On EasyASR, we have produced state-of-the-art results over several public datasets for Mandarin speech recognition.

AAAI Conference 2021 Conference Paper

KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification

  • Chengyu Wang
  • Minghui Qiu
  • Jun Huang
  • Xiaofeng He

Lexical relations describe how concepts are semantically related, in the form of relation triples. The accurate prediction of lexical relations between concepts is challenging, due to the sparsity of patterns indicating the existence of such relations. We propose the Knowledge-Enriched Meta-Learning (KEML) framework to address lexical relation classification. In KEML, the LKB-BERT (Lexical Knowledge Base-BERT) model is first presented to learn concept representations from text corpora, with rich lexical knowledge injected by distant supervision. A probabilistic distribution of auxiliary tasks is defined to increase the model’s ability to recognize different types of lexical relations. We further propose a neural classifier integrated with special relation recognition cells, in order to combine meta-learning over the auxiliary task distribution and supervised learning for LRC. Experiments over multiple datasets show KEML outperforms state-of-the-art methods.

AAAI Conference 2021 Conference Paper

Reinforced History Backtracking for Conversational Question Answering

  • Minghui Qiu
  • Xinjing Huang
  • Cen Chen
  • Feng Ji
  • Chen Qu
  • Wei Wei
  • Jun Huang
  • Yin Zhang

To model the context history in multi-turn conversations has become a critical step towards a better understanding of the user query in question answering systems. To utilize the context history, most existing studies treat the whole context as input, which will inevitably face the following two challenges. First, modeling a long history can be costly as it requires more computation resources. Second, the long context history consists of a lot of irrelevant information that makes it difficult to model appropriate information relevant to the user query. To alleviate these problems, we propose a reinforcement learning based method to capture and backtrack the related conversation history to boost model performance in this paper. Our method seeks to automatically backtrack the history information with the implicit feedback from the model performance. We further consider both immediate and delayed rewards to guide the reinforced backtracking policy. Extensive experiments on a large conversational question answering dataset show that the proposed method can help to alleviate the problems arising from longer context history. Meanwhile, experiments show that the method yields better performance than other strong baselines, and the actions made by the method are insightful.

IJCAI Conference 2020 Conference Paper

AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search

  • Daoyuan Chen
  • Yaliang Li
  • Minghui Qiu
  • Zhen Wang
  • Bofang Li
  • Bolin Ding
  • Hongbo Deng
  • Jun Huang

Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick inference with limited resources. Existing methods compress BERT into small models while such compression is task-independent, i. e. , the same compressed BERT for all different downstream tasks. Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks. We incorporate a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task-adaptive BERT compression. We evaluate AdaBERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12. 7x to 29. 3x faster than BERT in inference time and 11. 5x to 17. 0x smaller in terms of parameter size, while comparable performance is maintained.

IJCAI Conference 2020 Conference Paper

Discovering Latent Class Labels for Multi-Label Learning

  • Jun Huang
  • Linchuan Xu
  • Jing Wang
  • Lei Feng
  • Kenji Yamanishi

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i. e. , Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.