Arrow Research search

Author name cluster

Min Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

70 papers
2 author rows

Possible papers

70

AAAI Conference 2026 Conference Paper

3D-ANC: Adaptive Neural Collapse for Robust 3D Point Cloud Recognition

  • Yuanmin Huang
  • Wenxuan Li
  • Mi Zhang
  • Xiaohan Zhang
  • Xiaoyu You
  • Min Yang

Deep neural networks have recently achieved notable progress in 3D point cloud recognition, yet their vulnerability to adversarial perturbations poses critical security challenges in practical deployments. Conventional defense mechanisms struggle to address the evolving landscape of multifaceted attack patterns. Through systematic analysis of existing defenses, we identify that their unsatisfactory performance primarily originates from an entangled feature space, where adversarial attacks can be performed easily. To this end, we present 3D-ANC, a novel approach that capitalizes on the Neural Collapse (NC) mechanism to orchestrate discriminative feature learning. In particular, NC depicts where last-layer features and classifier weights jointly evolve into a simplex equiangular tight frame (ETF) arrangement, establishing maximally separable class prototypes. However, leveraging this advantage in 3D recognition confronts two substantial challenges: (1) prevalent class imbalance in point cloud datasets, and (2) complex geometric similarities between object categories. To tackle these obstacles, our solution combines an ETF-aligned classification module with an adaptive training framework consisting of representation-balanced learning (RBL) and dynamic feature direction loss (FDL). 3D-ANC seamlessly empowers existing models to develop disentangled feature spaces despite the complexity in 3D data distribution. Comprehensive evaluations state that 3D-ANC significantly improves the robustness of models with various structures on two datasets. For instance, DGCNN's classification accuracy is elevated from 27.2% to 80.9% on ModelNet40 -- a 53.7% absolute gain that surpasses leading baselines by 34.0%.

AAAI Conference 2026 Conference Paper

Automatic Paper Reviewing with Heterogeneous Graph Reasoning over LLM-Simulated Reviewer-Author Debates

  • Shuaimin Li
  • Liyang Fan
  • Yufang Lin
  • Zeyang Li
  • Xian Wei
  • Shiwen Ni
  • Hamid Alinejad-Rokny
  • Min Yang

Existing paper review methods often rely on superficial manuscript features or directly on large language models (LLMs), which are prone to hallucinations, biased scoring, and limited reasoning capabilities. Moreover, these methods often fail to capture the complex argumentative reasoning and negotiation dynamics inherent in reviewer-author interactions. To address these limitations, we propose ReViewGraph (Reviewer-Author Debates Graph Reasoner), a novel framework that performs heterogeneous graph reasoning over LLM-simulated multi-round reviewer-author debates. In our approach, reviewer-author exchanges are simulated through LLM-based multi-agent collaboration. Diverse opinion relations (e.g., acceptance, rejection, clarification, and compromise) are then explicitly extracted and encoded as typed edges within a heterogeneous interaction graph. By applying graph neural networks to reason over these structured debate graphs, ReViewGraph captures fine-grained argumentative dynamics and enables more informed review decisions. Extensive experiments on three datasets demonstrate that ReViewGraph outperforms strong baselines with an average relative improvement of 15.73%, underscoring the value of modeling detailed reviewer–author debate structures.

AAAI Conference 2026 Conference Paper

SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse

  • Yiming Sun
  • Mi Zhang
  • Feifei Li
  • Geng Hong
  • Min Yang

Despite Video Large Language Models (Video-LLMs) having rapidly advanced in recent years, perceptual hallucinations pose a substantial safety risk, which severely restricts their real-world applicability. While several methods for hallucination mitigation have been proposed, they often compromise the model’s capacity for video understanding and reasoning. In this work, we propose SmartSight, a pioneering step to address this issue in a training-free manner by leveraging the model’s own introspective capabilities. Specifically, SmartSight generates multiple candidate responses to uncover low-hallucinated outputs that are often obscured by standard greedy decoding. It assesses the hallucination of each response using the Temporal Attention Collapse score, which measures whether the model over-focuses on trivial temporal regions of the input video when generating the response. To improve efficiency, SmartSight identifies the Visual Attention Vanishing point, enabling more accurate hallucination estimation and early termination of hallucinated responses, leading to a substantial reduction in decoding cost. Experiments show that SmartSight substantially lowers hallucinations for QwenVL-2.5-7B by 10.59% on VRIPT-HAL, while simultaneously enhancing video understanding and reasoning, boosting performance on VideoMMMU by 8.86%. These results highlight SmartSight’s effectiveness in improving the reliability of open-source Video-LLMs.

AAAI Conference 2025 Conference Paper

A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation

  • Bingbing Wang
  • Yiming Du
  • Bin Liang
  • Zhixin Bai
  • Min Yang
  • Baojun Wang
  • Kam-Fai Wong
  • Ruifeng Xu

Stickers are widely used in online chatting, which can vividly express someone's intention, emotion, or attitude. Existing conversation research typically retrieves stickers based on a single session or the previous textual information, which can not adapt to the multi-modal and multi-session nature of the real-world conversation. To this end, we introduce MultiChat, a new dataset for sticker retrieval facing the multi-modal and multi-session conversation, comprising 1,542 sessions, featuring 50,192 utterances and 2,182 stickers. Based on the created dataset, we propose a novel Intent-Guided Sticker Retrieval (IGSR) framework that retrieves stickers for multi-modal and multi-session conversation history drawing support from intent learning. Specifically, we introduce sticker attributes to better leverage the sticker information in multi-modal conversation, which are incorporated with utterances to construct a memory bank. Further, we extract relevant memories for the current conversation from the memory bank to identify the intent of the current conversation, and then retrieve a sticker to respond guided by the intent. Extensive experiments on our MultiChat dataset reveal the robustness and effectiveness of our IGSR approach in multi-session, multi-modal scenarios.

NeurIPS Conference 2025 Conference Paper

Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost

  • Runzhe Zhan
  • Zhihong Huang
  • Xinyi Yang
  • Lidia Chao
  • Min Yang
  • Derek Wong

Recent advancements in large reasoning models (LRMs) have introduced an intermediate "thinking" process prior to generating final answers, improving their reasoning capabilities on complex downstream tasks. However, the potential of LRMs as evaluators for machine translation (MT) quality remains underexplored. We provides the first systematic analysis of LRM-as-a-judge in MT evaluation. We identify key challenges, revealing LRMs require tailored evaluation materials, tend to "overthink" simpler instances and have issues with scoring mechanisms leading to overestimation. To address these, we propose to calibrate LRM thinking by training them on synthetic, human-like thinking trajectories. Our experiments on WMT24 Metrics benchmarks demonstrate that this approach largely reduces thinking budgets by ~35x while concurrently improving evaluation performance across different LRM scales from 7B to 32B (e. g. , R1-Distill-Qwen-7B achieves a +8. 7 correlation point improvement). These findings highlight the potential of efficiently calibrated LRMs to advance fine-grained automatic MT evaluation.

IS Journal 2025 Journal Article

Benchmarking Explainable Argumentation Dialogue via Freeman’s Theory

  • Yang Sun
  • Geng Tu
  • Wenpeng Lu
  • Min Yang
  • Erik Cambria
  • Ruifeng Xu

Argumentative dialogue involves structured exchanges of claims and supporting evidence, yet progress in building effective dialogue systems is limited by the scarcity of high-quality datasets. To address this, we introduce CMV-AD, a baseline dataset derived from the ChangeMyView corpus, designed for modeling structured argumentative interactions. We further propose FTCoT, a Freeman’s Theory-based Chain-of-Thought framework that enhances interpretability and reasoning in dialogue generation. FTCoT represents each dialogue turn with a structured quadruple: Dialogue Summary, User Argument, Assistant Argument, and Response Reasoning. We construct FTCoT using large language models (LLMs), leveraging their capabilities in reasoning and data annotation. Extensive automatic and human evaluations demonstrate the effectiveness of FTCoT in improving both the interpretability and quality of generated responses.

IJCAI Conference 2025 Conference Paper

CAN-ST: Clustering Adaptive Normalization for Spatio-temporal OOD Learning

  • Min Yang
  • Yang An
  • Jinliang Deng
  • Xiaoyu Li
  • Bin Xu
  • Ji Zhong
  • Xiankai Lu
  • Yongshun Gong

Spatio-temporal data mining is crucial for decision-making and planning in diverse domains. However, in real-world scenarios, training and testing data are often not independent or identically distributed due to rapid changes in data distributions over time and space, resulting in spatio-temporal out-of-distribution (OOD) challenges. This non-stationarity complicates accurate predictions and has motivated research efforts focused on mitigating non-stationarity through normalization operations. Existing methods, nonetheless, often address individual time series in isolation, neglecting correlations across series, which limits their capacity to handle complex spatio-temporal dynamics and results in suboptimal solutions. To overcome these challenges, we propose Clustering Adaptive Normalization (CAN-ST), a general and model-agnostic method that mitigates non-stationarity by capturing both localized distributional changes and shared patterns across nodes via adaptive clustering and a parameter register. As a plugin, CAN-ST can be easily integrated into various spatio-temporal prediction models. Extensive experiments on multiple datasets with diverse forecasting models demonstrate that CAN-ST consistently improves performance by over 20% on average and outperforms state-of-the-art normalization methods.

AAAI Conference 2025 Conference Paper

Fine-Tuning Language Models with Collaborative and Semantic Experts

  • Jiaxi Yang
  • Binyuan Hui
  • Min Yang
  • Jian Yang
  • Lei Zhang
  • Qiang Qu
  • Junyang Lin

Recent advancements in large language models (LLMs) have broadened their application scope but revealed challenges in balancing capabilities across general knowledge, coding, and mathematics. To address this, we introduce a Collaborative and Semantic Experts (CoE) approach for supervised fine-tuning (SFT), which employs a two-phase training strategy. Initially, expert training fine-tunes the feed-forward network on specialized datasets, developing distinct experts in targeted domains. Subsequently, expert leveraging synthesizes these trained experts into a structured model with semantic guidance to activate specific experts, enhancing performance and interpretability. Evaluations on comprehensive benchmarks across MMLU, HumanEval, GSM8K, MT-Bench, and AlpacaEval confirm CoE's efficacy, demonstrating improved performance and expert collaboration in diverse tasks, significantly outperforming traditional SFT methods.

AAAI Conference 2025 Conference Paper

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

  • Lei Zhang
  • Yunshui Li
  • Jiaming Li
  • Xiaobo Xia
  • Jiaxi Yang
  • Run Luo
  • Minzheng Wang
  • Longze Chen

Some of the latest released Code Large Language Models (Code LLMs) have been trained on repository-level code data, enabling them to perceive repository structures and utilize cross-file code information. This capability allows us to directly concatenate the content of repository code files in prompts to achieve repository-level code completion. However, in real development scenarios, directly concatenating all code repository files in a prompt can easily exceed the context window of Code LLMs, leading to a significant decline in completion performance. Additionally, overly long prompts can increase completion latency, negatively impacting the user experience. In this study, we conducted extensive experiments, including completion error analysis, topology dependency analysis, and cross-file content analysis, to investigate the factors affecting repository-level code completion. Based on the conclusions drawn from these preliminary experiments, we proposed a strategy called **Hierarchical Context Pruning (HCP)** to construct high-quality completion prompts. We applied the **HCP** to six Code LLMs and evaluated them on the CrossCodeEval dataset. The experimental results showed that, compared to previous methods, the prompts constructed using our **HCP** strategy achieved higher completion accuracy on five out of six Code LLMs. Additionally, the **HCP** managed to keep the prompt length around 8k tokens (whereas the full repository code is approximately 50k tokens), significantly improving completion throughput. Our code and data will be publicly available.

ICML Conference 2025 Conference Paper

InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory

  • Feifei Li
  • Mi Zhang
  • Zhaoxiang Wang
  • Min Yang

Interpretability of point cloud (PC) models becomes imperative given their deployment in safety-critical scenarios such as autonomous vehicles. We focus on attributing PC model outputs to interpretable critical concepts, defined as meaningful subsets of the input point cloud. To enable human-understandable diagnostics of model failures, an ideal critical subset should be faithful (preserving points that causally influence predictions) and conceptually coherent (forming semantically meaningful structures that align with human perception). We propose InfoCons, an explanation framework that applies information-theoretic principles to decompose the point cloud into 3D concepts, enabling the examination of their causal effect on model predictions with learnable priors. We evaluate InfoCons on synthetic datasets for classification, comparing it qualitatively and quantitatively with four baselines. We further demonstrate its scalability and flexibility on two real-world datasets and in two applications that utilize critical scores of PC.

IROS Conference 2025 Conference Paper

Inverse-Free and Data-Driven Motion Tracking Control for Redundant Robot with Fuzzy Recurrent Neural Network

  • Min Yang
  • Siying Zhu
  • Hui Zhang

Precise motion tracking control with unknown structural knowledge and noise disturbance for redundant robots remains a critical and unresolved challenge. This article proposes a novel data-driven fuzzy discrete recurrent neural network (D 2 -FDRNN) model to address two fundamental limitations of existing models: dependency on known kinematic knowledge and fixed sampling schemes. First, a Jacobian pseudo-inverse estimator is developed to reconstruct the manipulator’s necessary kinematic knowledge using input and output data, eliminating the need for explicit Jacobian inversion. Second, a fuzzy logic-based adaptive sampling strategy dynamically adjusts the step size to balance computational efficiency and tracking precision. In addition, a Kalman filter algorithm is applied to reduce the impact of noise. Rigorous proofs confirm the model’s exponential convergence and noise immunity. To validate the proposed D 2 -FDRNN model, simulations and physical experiments are carried out. The source code is available at https://github.com/YingluckZ/DD-FDRNN.git.

AILAW Journal 2025 Journal Article

LLMES: an LLMs-based expert system for quality management system audits

  • Yunhan Li
  • Zeyang Shi
  • Yujing Li
  • Yuntao Chen
  • Gengshen Wu
  • Min Yang

Abstract Many organizations still face significant challenges in document audits, such as contract reviews, accounting audits, and compliance checks, which require substantial manpower and time. Current Artificial Intelligence solutions struggle with ambiguous requirements, unclear standards, and complex documents, limiting their scalability and effectiveness. To address these challenges, we propose a novel architecture, LLMES, designed to generate a concise and independent audit checklist. This approach allows Large Language Models to perform audits and produce results without prior fine-tuning. LLMES effectively integrates complex legal documents, expert knowledge, and mandatory regulatory guidelines with large language modeling to enhance artificial intelligence-assisted auditing. In our experiments auditing a medical device quality management system using a 292-item sampled checklist, LLMES achieved an F1 score of 0. 895 on core datasets, significantly outperforming the 0. 696 score of direct auditing. Additionally, LLMES attained an F1 score of 0. 876 ± 0. 034 across full datasets and 0. 905 on a complete 1696-item checklist. The source code and datasets are publicly available on our GitHub repository: https: //github. com/lyxx3rd/LLMES

TIST Journal 2025 Journal Article

MGRL4RE: A Multi-Graph Representation Learning Approach for Urban Region Embedding

  • Meng Chen
  • Zechen Li
  • Hongwei Jia
  • Xin Shao
  • Jun Zhao
  • Qiang Gao
  • Min Yang
  • Yilong Yin

Using multi-modal data to learn region representations has gained popularity for its ability to reveal diverse socioeconomic features in cities. However, many studies focus solely on semantic features from points-of-interest (POIs), neglecting the issue of spatial imbalance. This article introduces a Multi-Graph Representation Learning framework for Region Embedding (MGRL4RE), which leverages both inter-region and intra-region correlations through two main components: multi-graph construction based on various region correlations and multi-graph representation learning. The construction module creates a multi-graph reflecting various correlations among regions, utilizing geo-tagged POIs, region data, and human mobility data. Specifically, we assess a region’s importance relative to its spatial context (neighborhood) and develop spatially invariant semantic features to address spatial imbalance. Furthermore, the representation learning module generates comprehensive and effective region representations via multi-view embedding fusion. Our extensive experiments across various downstream tasks, including land use clustering, region popularity prediction, and crime prediction, confirm that our model significantly outperforms existing state-of-the-art region embedding methods.

IROS Conference 2025 Conference Paper

Novel Data-Driven Repetitive Motion Control Scheme for Redundant Manipulators With Zeroing Neurodynamics

  • Min Yang
  • Kaixu Chen
  • Hui Zhang

Repetitive motion control of redundant manipulators typically requires precise kinematic models to construct Jacobian matrices. However, model-based approaches are inherently limited when manipulator parameters are unavailable or only partially known. This paper introduces a novel data-driven discrete zeroing neurodynamics (DDZN) model for repetitive motion control. Specifically, a Jacobian matrix estimation method based on data-driven technology is proposed, which eliminates the need for prior models by leveraging historical input-output information. By integrating the Jacobian matrix estimation with a discrete zeroing neurodynamics (DZN) model, the approach enables simultaneous trajectory tracking and repeatable configuration recovery without relying on structural parameters. Theoretical analysis verifies the performance of DDZN model under noise environment. Furthermore, abundant experiment results validate its reliability and superior performance compared with various models.

NeurIPS Conference 2025 Conference Paper

OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis

  • Run Luo
  • Ting-En Lin
  • Haonan Zhang
  • Yuchuan Wu
  • Xiong Liu
  • Yongbin Li
  • Longze Chen
  • Jiaming Li

Recent advancements in omnimodal learning have significantly improved understanding and generation across images, text, and speech, yet these developments remain predominantly confined to proprietary models. The lack of high-quality omnimodal datasets and the challenges of real-time emotional speech synthesis have notably hindered progress in open-source research. To address these limitations, we introduce OpenOmni, a two-stage training framework that integrates omnimodal alignment and speech generation to develop a state-of-the-art omnimodal large language model. In the alignment phase, a pretrained speech model undergoes further training on image-text tasks, enabling (near) zero-shot generalization from vision to speech, outperforming models trained on tri-modal datasets. In the speech generation phase, a lightweight decoder is trained on speech tasks with direct preference optimization, which enables real-time emotional speech synthesis with high fidelity. Extensive experiments demonstrate that OpenOmni surpasses state-of-the-art models across omnimodal, vision-language, and speech-language benchmarks. It achieves a 4-point absolute improvement on OmniBench over the leading open-source model VITA, despite using 5$\times$ fewer training examples and a smaller model size (7B vs. 7$\times$8B). Besides, OpenOmni achieves real-time speech generation with less than 1 second latency at non-autoregressive mode, reducing inference time by 5$\times$ compared to autoregressive methods, and improves emotion classification accuracy by 7. 7\%. The codebase is available at https: //github. com/RainBowLuoCS/OpenOmni.

NeurIPS Conference 2025 Conference Paper

R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO

  • Huanjin Yao
  • Qixiang Yin
  • Jingyi Zhang
  • Min Yang
  • Yibo Wang
  • Wenhao Wu
  • Fei Su
  • Li Shen

In this work, we aim to incentivize the reasoning ability of Multimodal Large Language Models (MLLMs) via reinforcement learning (RL) and develop an effective approach that mitigates the sparse reward and advantage vanishing issues during RL. To this end, we propose Share-GRPO, a novel RL approach that tackle these issues by exploring and sharing diverse reasoning trajectories over expanded question space. Specifically, Share-GRPO first expands the question space for a given question via data transformation techniques, and then encourages MLLM to effectively explore diverse reasoning trajectories over the expanded question space and shares the discovered reasoning trajectories across the expanded questions during RL. In addition, Share-GRPO also shares reward information during advantage computation, which estimates solution advantages hierarchically across and within question variants, allowing more accurate estimation of relative advantages and improving the stability of policy training. Extensive evaluations over 6 widely-used reasoning benchmarks showcase the superior performance of our method. Code is available at https: //github. com/HJYao00/R1-ShareVL.

NeurIPS Conference 2025 Conference Paper

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

  • Xeron Du
  • Yifan Yao
  • Kaijing Ma
  • Bingli Wang
  • Tianyu Zheng
  • Minghao Liu
  • Yiming Liang
  • Xiaolong Jin

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e. g. , the reasoning-focused model Gemini-2. 5-Pro achieved the highest accuracy of 63. 56% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

NeurIPS Conference 2025 Conference Paper

The Future Unmarked: Watermark Removal in AI-Generated Images via Next-Frame Prediction

  • Huming Qiu
  • Zhaoxiang Wang
  • Mi Zhang
  • Xiaohan Zhang
  • Xiaoyu You
  • Min Yang

Image watermarking embeds imperceptible signals into AI-generated images for deepfake detection and provenance verification. Although recent semantic-level watermarking methods demonstrate strong resistance against conventional pixel-level removal attacks, their robustness against more advanced removal strategies remains underexplored, raising concerns about their reliability in practical scenarios. Existing removal attacks primarily operate in the pixel domain without altering image semantics, which limits their effectiveness against semantic-level watermarks. In this paper, we propose Next Frame Prediction Attack (NFPA), the first semantic-level removal attack. Unlike pixel-level attacks, NFPA formulates watermark removal as a video generation task: it treats the watermarked image as the initial frame and aims to subtly manipulate the image semantics to generate the next-frame image, i. e. , the unwatermarked image. We conduct a comprehensive evaluation on eight state-of-the-art image watermarking schemes, demonstrating that NFPA consistently outperforms thirteen removal attack baselines in terms of the trade-off between watermark removal and image quality. Our results reveal the vulnerabilities of current image watermarking methods and highlight the urgent need for more robust watermarks.

AAAI Conference 2025 Conference Paper

Training on the Benchmark Is Not All You Need

  • Shiwen Ni
  • Xiangtao Kong
  • Chengming Li
  • Xiping Hu
  • Ruifeng Xu
  • Jia Zhu
  • Min Yang

The success of Large Language Models (LLMs) relies heavily on the huge amount of pre-training data learned in the pre-training phase. The opacity of the pre-training process and the training data causes the results of many benchmark tests to become unreliable. If any model has been trained on a benchmark test set, it can seriously hinder the health of the field. In order to automate and efficiently test the capabilities of large language models, numerous mainstream benchmarks adopt a multiple-choice format. As the swapping of the contents of multiple-choice options does not affect the meaning of the question itself, we propose a simple and effective data leakage detection method based on this property. Specifically, we shuffle the contents of the options in the data to generate the corresponding derived data sets, and then detect data leakage based on the model's log probability distribution over the derived data sets. If there is a maximum and outlier in the set of log probabilities, it indicates that the data is leaked. Our method is able to work under gray-box conditions without access to model training data or weights, effectively identifying data leakage from benchmark test sets in model pre-training data, including both normal scenarios and complex scenarios where options may have been shuffled intentionally or unintentionally. Through experiments based on two LLMs and benchmark designs, we demonstrate the effectiveness of our method. In addition, we evaluate the degree of data leakage of 35 mainstream open-source LLMs on four benchmark datasets and give a ranking of the leaked LLMs for each benchmark, and we find that the Qwen family of LLMs has the highest degree of data leakage.

NeurIPS Conference 2025 Conference Paper

VCM: Vision Concept Modeling with Adaptive Vision Token Compression via Instruction Fine-Tuning

  • Run Luo
  • Renke Shan
  • Longze Chen
  • Ziqiang Liu
  • Lu Wang
  • Min Yang
  • Xiaobo Xia

Large vision-language models (LVLMs) have emerged as foundational tools for real-world AI applications. Despite their remarkable capabilities, current LVLMs process entire images at the token level, leading to significant inefficiencies compared to human cognition, which selectively focuses on high-level vision concepts. This token-level redundancy becomes increasingly problematic for high-resolution images and long video sequences, resulting in large computational costs and limited scalability in practical applications. To address this limitation, we introduce the concept of a vision concept model, a novel paradigm that enables LVLMs to dynamically extract the most relevant vision concepts from complex inputs, based on task-specific instructions. To optimize this vision concept modeling process, we propose VCM, a self-supervised framework that leverages vision-language correlations across diverse instances. VCM is designed to learn meaningful vision concepts without the need for expensive concept-level annotations. At its core, it employs a forward-backward optimization algorithm that supports LVLMs to adjust concept granularity and spatial alignment dynamically. Experiments demonstrate that VCM remarkably reduces computational costs (e. g. , achieving up to 85\% fewer FLOPs for LLaVA-1. 5-7B), while maintaining strong performance across a series of vision-language tasks. The codebase is available at https: //github. com/RainBowLuoCS/VCM.

AAAI Conference 2024 Conference Paper

Counterfactual-Enhanced Information Bottleneck for Aspect-Based Sentiment Analysis

  • Mingshan Chang
  • Min Yang
  • Qingshan Jiang
  • Ruifeng Xu

Despite having achieved notable success for aspect-based sentiment analysis (ABSA), deep neural networks are susceptible to spurious correlations between input features and output labels, leading to poor robustness. In this paper, we propose a novel Counterfactual-Enhanced Information Bottleneck framework (called CEIB) to reduce spurious correlations for ABSA. CEIB extends the information bottleneck (IB) principle to a factual-counterfactual balancing setting by integrating augmented counterfactual data, with the goal of learning a robust ABSA model. Concretely, we first devise a multi-pattern prompting method, which utilizes the large language model (LLM) to generate high-quality counterfactual samples from the original samples. Then, we employ the information bottleneck principle and separate the mutual information into factual and counterfactual parts. In this way, we can learn effective and robust representations for the ABSA task by balancing the predictive information of these two parts. Extensive experiments on five benchmark ABSA datasets show that our CEIB approach achieves superior prediction performance and robustness over the state-of-the-art baselines. Code and data to reproduce the results in this paper is available at: https://github.com/shesshan/CEIB.

AAAI Conference 2024 Conference Paper

DiffusionTrack: Diffusion Model for Multi-Object Tracking

  • Run Luo
  • Zikai Song
  • Lintao Ma
  • Jinlin Wei
  • Wei Yang
  • Min Yang

Multi-object tracking (MOT) is a challenging vision task that aims to detect individual objects within a single frame and associate them across multiple frames. Recent MOT approaches can be categorized into two-stage tracking-by-detection (TBD) methods and one-stage joint detection and tracking (JDT) methods. Despite the success of these approaches, they also suffer from common problems, such as harmful global or local inconsistency, poor trade-off between robustness and model complexity, and lack of flexibility in different scenes within the same video. In this paper we propose a simple but robust framework that formulates object detection and association jointly as a consistent denoising diffusion process from paired noise boxes to paired ground-truth boxes. This novel progressive denoising diffusion strategy substantially augments the tracker's effectiveness, enabling it to discriminate between various objects. During the training stage, paired object boxes diffuse from paired ground-truth boxes to random distribution, and the model learns detection and tracking simultaneously by reversing this noising process. In inference, the model refines a set of paired randomly generated boxes to the detection and tracking results in a flexible one-step or multi-step denoising diffusion process. Extensive experiments on three widely used MOT benchmarks, including MOT17, MOT20, and DanceTrack, demonstrate that our approach achieves competitive performance compared to the current state-of-the-art methods. Code is available at https://github.com/RainBowLuoCS/DiffusionTrack.

NeurIPS Conference 2024 Conference Paper

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

  • Ziqiang Liu
  • Feiteng Fang
  • Xi Feng
  • Xinrun Du
  • Chenhao Zhang
  • Zekun Wang
  • yuelin bai
  • Qixuan Zhao

The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74. 8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https: //huggingface. co/datasets/m-a-p/II-Bench.

AAAI Conference 2024 Conference Paper

RRL: Recommendation Reverse Learning

  • Xiaoyu You
  • Jianwei Xu
  • Mi Zhang
  • Zechen Gao
  • Min Yang

As societies become increasingly aware of data privacy, regulations require that private information about users must be removed from both database and ML models, which is more colloquially called `the right to be forgotten`. Such privacy problems of recommendation systems, which hold large amounts of private data, are drawing increasing attention. Recent research suggests dividing the preference data into multiple shards and training submodels with these shards and forgetting users' personal preference data by retraining the submodels of marked shards. Despite the computational efficiency development compared with retraining from scratch, the overall recommendation performance deteriorates after dividing the shards because the collaborative information contained in the training data is broken. In this paper, we aim to propose a forgetting framework for recommendation models that neither separate the training data nor jeopardizes the recommendation performance, named Recommendation Reverse Learning (RRL). Given the trained recommendation model and marked preference data, we devise Reverse BPR Objective (RBPR Objective) to fine-tune the recommendation model to force it to forget the marked data. Nevertheless, as the recommendation model encode the complex collaborative information among users, we propose to utilize Fisher Information Matrix (FIM) to estimate the influence of reverse learning on other users' collaborative information and guide the updates of representations. We conduct experiments on two representative recommendation models and three public benchmark datasets to verify the efficiency of RRL. To verify the forgetting completeness, we use RRL to make the recommendation model poisoned by shilling attacks forget malicious users.

AAAI Conference 2024 Conference Paper

Urban Region Embedding via Multi-View Contrastive Prediction

  • Zechen Li
  • Weiming Huang
  • Kai Zhao
  • Min Yang
  • Yongshun Gong
  • Meng Chen

Recently, learning urban region representations utilizing multi-modal data (information views) has become increasingly popular, for deep understanding of the distributions of various socioeconomic features in cities. However, previous methods usually blend multi-view information in a posteriors stage, falling short in learning coherent and consistent representations across different views. In this paper, we form a new pipeline to learn consistent representations across varying views, and propose the multi-view Contrastive Prediction model for urban Region embedding (ReCP), which leverages the multiple information views from point-of-interest (POI) and human mobility data. Specifically, ReCP comprises two major modules, namely an intra-view learning module utilizing contrastive learning and feature reconstruction to capture the unique information from each single view, and inter-view learning module that perceives the consistency between the two views using a contrastive prediction learning scheme. We conduct thorough experiments on two downstream tasks to assess the proposed model, i.e., land use clustering and region popularity prediction. The experimental results demonstrate that our model outperforms state-of-the-art baseline methods significantly in urban region representation learning.

AAAI Conference 2023 Conference Paper

A Generative Approach for Script Event Prediction via Contrastive Fine-Tuning

  • Fangqi Zhu
  • Jun Gao
  • Changlong Yu
  • Wei Wang
  • Chen Xu
  • Xin Mu
  • Min Yang
  • Ruifeng Xu

Script event prediction aims to predict the subsequent event given the context. This requires the capability to infer the correlations between events. Recent works have attempted to improve event correlation reasoning by using pretrained language models and incorporating external knowledge (e.g., discourse relations). Though promising results have been achieved, some challenges still remain. First, the pretrained language models adopted by current works ignore event-level knowledge, resulting in an inability to capture the correlations between events well. Second, modeling correlations between events with discourse relations is limited because it can only capture explicit correlations between events with discourse markers, and cannot capture many implicit correlations. To this end, we propose a novel generative approach for this task, in which a pretrained language model is fine-tuned with an event-centric pretraining objective and predicts the next event within a generative paradigm. Specifically, we first introduce a novel event-level blank infilling strategy as the learning objective to inject event-level knowledge into the pretrained language model, and then design a likelihood-based contrastive loss for fine-tuning the generative model. Instead of using an additional prediction layer, we perform prediction by using sequence likelihoods generated by the generative model. Our approach models correlations between events in a soft way without any external knowledge. The likelihood-based prediction eliminates the need to use additional networks to make predictions and is somewhat interpretable since it scores each word in the event. Experimental results on the multi-choice narrative cloze (MCNC) task demonstrate that our approach achieves better results than other state-of-the-art baselines. Our code will be available at https://github.com/zhufq00/mcnc.

AAAI Conference 2023 Conference Paper

Balanced Meta Learning and Diverse Sampling for Lifelong Task-Oriented Dialogue Systems

  • Qiancheng Xu
  • Min Yang
  • Ruifeng Xu

In real-world scenarios, it is crucial to build a lifelong taskoriented dialogue system (TDS) that continually adapts to new knowledge without forgetting previously acquired experiences. Existing approaches mainly focus on mitigating the catastrophic forgetting in lifelong TDS. However, the transfer ability to generalize the accumulated old knowledge to new tasks is underexplored. In this paper, we propose a two-stage lifelong task-oriented dialogue generation method to mitigate catastrophic forgetting and encourage knowledge transfer simultaneously, inspired by the learning process. In the first stage, we learn task-specific masks which adaptively preserve the knowledge of each visited task so as to mitigate catastrophic forgetting. In this stage, we are expected to learn the task-specific knowledge which is tailored for each task. In the second stage, we bring the knowledge from the encountered tasks together and understand thoroughly. To this end, we devise a balanced meta learning strategy for both forward and backward knowledge transfer in the lifelong learning process. In particular, we perform meta-update with a meta-test set sampled from the current training data for forward knowledge transfer. In addition, we employ an uncertainty-based sampling strategy to select and store representative dialogue samples into episodic memory and perform meta-update with a meta-test set sampled from the memory for backward knowledge transfer. With extensive experiments on 29 tasks, we show that MetaLTDS outperforms the strong baselines in terms of both effectiveness and efficiency. For reproducibility, we submit our code at: https: //github.com/travis-xu/MetaLTDS.

AAAI Conference 2023 Conference Paper

Black-Box Adversarial Attack on Time Series Classification

  • Daizong Ding
  • Mi Zhang
  • Fuli Feng
  • Yuanmin Huang
  • Erling Jiang
  • Min Yang

With the increasing use of deep neural network (DNN) in time series classification (TSC), recent work reveals the threat of adversarial attack, where the adversary can construct adversarial examples to cause model mistakes. However, existing researches on the adversarial attack of TSC typically adopt an unrealistic white-box setting with model details transparent to the adversary. In this work, we study a more rigorous black-box setting with attack detection applied, which restricts gradient access and requires the adversarial example to be also stealthy. Theoretical analyses reveal that the key lies in: estimating black-box gradient with diversity and non-convexity of TSC models resolved, and restricting the l0 norm of the perturbation to construct adversarial samples. Towards this end, we propose a new framework named BlackTreeS, which solves the hard optimization issue for adversarial example construction with two simple yet effective modules. In particular, we propose a tree search strategy to find influential positions in a sequence, and independently estimate the black-box gradients for these positions. Extensive experiments on three real-world TSC datasets and five DNN based models validate the effectiveness of BlackTreeS, e.g., it improves the attack success rate from 19.3% to 27.3%, and decreases the detection success rate from 90.9% to 6.8% for LSTM on the UWave dataset.

AAAI Conference 2023 Short Paper

Class Incremental Learning for Task-Oriented Dialogue System with Contrastive Distillation on Internal Representations (Student Abstract)

  • Qiancheng Xu
  • Min Yang
  • Binzong Geng

The ability to continually learn over time by grasping new knowledge and remembering previously learned experiences is essential for developing an online task-oriented dialogue system (TDS). In this paper, we work on the class incremental learning scenario where the TDS is evaluated without specifying the dialogue domain. We employ contrastive distillation on the intermediate representations of dialogues to learn transferable representations that suffer less from catastrophic forgetting. Besides, we provide a dynamic update mechanism to explicitly preserve the learned experiences by only updating the parameters related to the new task while keeping other parameters fixed. Extensive experiments demonstrate that our method significantly outperforms the strong baselines.

AAAI Conference 2023 Conference Paper

Effective Open Intent Classification with K-center Contrastive Learning and Adjustable Decision Boundary

  • Xiaokang Liu
  • Jianquan Li
  • Jingjing Mu
  • Min Yang
  • Ruifeng Xu
  • Benyou Wang

Open intent classification, which aims to correctly classify the known intents into their corresponding classes while identifying the new unknown (open) intents, is an essential but challenging task in dialogue systems. In this paper, we introduce novel K-center contrastive learning and adjustable decision boundary learning (CLAB) to improve the effectiveness of open intent classification. First, we pre-train a feature encoder on the labeled training instances, which transfers knowledge from known intents to unknown intents. Specifically, we devise a K-center contrastive learning algorithm to learn discriminative and balanced intent features, improving the generalization of the model for recognizing open intents. Second, we devise an adjustable decision boundary learning method with expanding and shrinking (ADBES) to determine the suitable decision conditions. Concretely, we learn a decision boundary for each known intent class, which consists of a decision center and the radius of the decision boundary. We then expand the radius of the decision boundary to accommodate more in-class instances if the out-of-class instances are far from the decision boundary; otherwise, we shrink the radius of the decision boundary. Extensive experiments on three benchmark datasets clearly demonstrate the effectiveness of our method for open intent classification.For reproducibility, we submit the code at: https://github.com/lxk00/CLAP

AAAI Conference 2022 Conference Paper

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-supervised Learning and Explicit Policy Injection

  • Wanwei He
  • Yinpei Dai
  • Yinhe Zheng
  • Yuchuan Wu
  • Zheng Cao
  • Dermot Liu
  • Peng Jiang
  • Min Yang

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2. 0 and Multi- WOZ2. 1, improving their end-to-end combined scores by 2. 5, 5. 3 and 5. 5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings. For reproducibility, we release the code and data at https: //github. com/siat-nlp/GALAXY.

NeurIPS Conference 2022 Conference Paper

House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography

  • Xudong Pan
  • Shengyao Zhang
  • Mi Zhang
  • Yifan Yan
  • Min Yang

In this paper, we present a capacity-aware neuron steganography scheme (i. e. , Cans) to covertly transmit multiple private machine learning (ML) datasets via a scheduled-to-publish deep neural network (DNN) as the carrier model. Unlike existing steganography schemes which treat the DNN parameters as bit strings, \textit{Cans} for the first time exploits the learning capacity of the carrier model via a novel parameter sharing mechanism. Extensive evaluation shows, Cans is the first working scheme which can covertly transmit over $10000$ real-world data samples within a carrier model which has $220\times$ less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets within a single carrier model, under a trivial distortion rate ($<10^{-5}$) and with almost no utility loss on the carrier model ($<1\%$). Besides, Cans implements by-design redundancy to be resilient against common post-processing techniques on the carrier model before the publishing.

NeurIPS Conference 2022 Conference Paper

Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

  • Guanghu Yuan
  • Fajie Yuan
  • Yudong Li
  • Beibei Kong
  • Shujie Li
  • Lei Chen
  • Min Yang
  • Chenyun Yu

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicking, liking, sharing, and following, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks. Our source codes and datasets will be included in supplementary materials.

AAAI Conference 2021 Conference Paper

A User-Adaptive Layer Selection Framework for Very Deep Sequential Recommender Models

  • Lei Chen
  • Fajie Yuan
  • Jiaxi Yang
  • Xiang Ao
  • Chengming Li
  • Min Yang

Sequential recommender systems (SRS) have become a research hotspot in recent studies. Because of the requirement in capturing user’s dynamic interests, sequential neural network based recommender models often need to be stacked with more hidden layers (e. g. , up to 100 layers) compared with standard collaborative filtering methods. However, the high network latency has become the main obstacle when deploying very deep recommender models into a production environment. In this paper, we argue that the typical prediction framework that treats all users equally during the inference phase is inefficient in running time, as well as sub-optimal in accuracy. To resolve such an issue, we present SkipRec, an adaptive inference framework by learning to skip inactive hidden layers on a per-user basis. Specifically, we devise a policy network to automatically determine which layers should be retained and which layers are allowed to be skipped, so as to achieve user-specific decisions. To derive the optimal skipping policy, we propose using gumbel softmax and reinforcement learning to solve the non-differentiable problem during backpropagation. We perform extensive experiments on three real-world recommendation datasets, and demonstrate that SkipRec attains comparable or better accuracy with much less inference time.

AAAI Conference 2021 Conference Paper

Exploring Auxiliary Reasoning Tasks for Task-oriented Dialog Systems with Meta Cooperative Learning

  • Bowen Qin
  • Min Yang
  • Lidong Bing
  • Qingshan Jiang
  • Chengming Li
  • Ruifeng Xu

In this paper, we propose a Meta Cooperative Learning (MCL) framework for task-oriented dialog systems (TDSs). Our model consists of an auxiliary KB reasoning task for learning meta KB knowledge, an auxiliary dialogue reasoning task for learning dialogue patterns, and a TDS task (primary task) that aims at not only retrieving accurate entities from KB but also generating natural responses, which are coordinated to achieve collective success in both retrieving accurate KB entities and generating human-like responses via meta learning. Concretely, the dialog generation model amalgamates complementary meta KB and dialog knowledge from two novel auxiliary reasoning tasks that together provide integrated guidance to build a high-quality TDS by adding regularization terms to force primary network to produce similar results to auxiliary networks. While MCL automatically learns appropriate labels for the two auxiliary reasoning tasks from the primary task, without requiring access to any further data. The key idea behind MCL is to use the performance of the primary task, which is trained alongside the auxiliary tasks in one iteration, to improve the auxiliary labels for the next iteration with meta learning. Experimental results on three benchmark datasets show that MCL can generate higher quality responses compared to several strong baselines in terms of both automatic and human evaluations. Code to reproduce the results in this paper is available at: https: //github. com/siat-nlp/MCL.

AAAI Conference 2021 Conference Paper

Imagine, Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning

  • Chunpu Xu
  • Min Yang
  • Chengming Li
  • Ying Shen
  • Xiang Ao
  • Ruifeng Xu

Visual storytelling is the task of generating a short story to describe an ordered image stream. Different from visual captions, stories contain not only factual descriptions, but also imaginary concepts that do not appear in the images. In this paper, we propose a novel imagine-reason-write generation framework (IRW) for visual storytelling, inspired by the logic of humans when they write a story. First, a multimodal imagining module is leveraged to learn the imaginative storyline explicitly, improving the coherence and reasonability of the generated story. Second, we employ a relational reasoning module to fully exploit the external knowledge (commonsense knowledge base) and task-specific knowledge (scene graph and event graph) with a relational reasoning method based on the storyline. In this way, we can effectively capture the most informative commonsense and visual relationships among objects in images, enhancing the diversity and informativeness of the generated story. Finally, we integrate the visual information and semantic (concept) information to generate human-like stories. Extensive experiments on a benchmark dataset (i. e. , VIST) demonstrate that the proposed IRW framework substantially outperforms the state-of-the-art methods across multiple evaluation metrics.

AAAI Conference 2021 Conference Paper

The Style-Content Duality of Attractiveness: Learning to Write Eye-Catching Headlines via Disentanglement

  • Mingzhe Li
  • Xiuying Chen
  • Min Yang
  • Shen Gao
  • Dongyan Zhao
  • Rui Yan

Eye-catching headlines function as the first device to trigger more clicks, bringing reciprocal effect between producers and viewers. Producers can obtain more traffic and profits, and readers can have access to outstanding articles. When generating attractive headlines, it is important to not only capture the attractive content but also follow an eye-catching written style. In this paper, we propose a Disentanglement-based Attractive Headline Generator (DAHG) that generates headline which captures the attractive content following the attractive style. Concretely, we first devise a disentanglement module to divide the style and content of an attractive prototype headline into latent spaces, with two auxiliary constraints to ensure the two spaces are indeed disentangled. The latent content information is then used to further polish the document representation and help capture the salient part. Finally, the generator takes the polished document as input to generate headline under the guidance of the attractive style. Extensive experiments on the public Kuaibao dataset show that DAHG achieves state-ofthe-art performance. Human evaluation also demonstrates that DAHG triggers 22% more clicks than existing models.

IS Journal 2021 Journal Article

Toward Aspect-Level Sentiment Modification Without Parallel Data

  • Qingnan Jiang
  • Lei Chen
  • Wei Zhao
  • Min Yang

This article takes the lead to study aspect-level sentiment modification (ALSM) without parallel data. Given a sentence, the task of ALSM needs to reverse the sentiment with respect to the given aspect while preserving other content. The main challenge is reversing the sentiment of the given aspect without affecting the sentiments of other aspects in the sentences. To handle this problem, we propose a joint aspect-level sentiment modification (called JASM) model. JASM is a multitask system, which jointly trains two coupled modules: aspect-specific sentiment words extraction and aspect-level sentiment transformation. Besides, we propose a novel memory mechanism to learn aspect-aware sentiment representation and a gating mechanism to dynamically select aspect-aware sentiment information or content information for generating the next words. Experiments show that the proposed model substantially outperforms the compared methods in both aspect-level sentiment transformation and content preservation. For applications, we conduct data augmentation for aspect-based sentiment analysis (ABSA) through generating plausible training data with the trained ALSM model. Experiments show that data augmentation with generated data boosts the performance of a broad range of ABSA models.

IJCAI Conference 2020 Conference Paper

AILA: A Question Answering System in the Legal Domain

  • Weiyi Huang
  • Jiahao Jiang
  • Qiang Qu
  • Min Yang

Question answering (QA) in the legal domain has gained increasing popularity for people to seek legal advice. However, existing QA systems struggle to comprehend the legal context and provide jurisdictionally relevant answers due to the lack of domain expertise. In this paper, we develop an Artificial Intelligence Law Assistant (AILA) for question answering in the domain of Chinese laws. AILA system automatically comprehends users' natural language queries with the help of the legal knowledge graph (KG) and provides the best matching answers for given queries. In addition, AILA provides visual cues to interpret the input queries and candidate answers based on the legal KG. Experimental results on a large-scale legal QA corpus show the effectiveness of AILA. To the best of our knowledge, AILA is the first Chinese legal QA system which integrates the domain knowledge from legal KG to comprehend the questions and answers for ranking QA pairs. AILA is available at http: //bmilab. ticp. io: 48478/.

AAAI Conference 2020 Conference Paper

Attentive User-Engaged Adversarial Neural Network for Community Question Answering

  • Yuexiang Xie
  • Ying Shen
  • Yaliang Li
  • Min Yang
  • Kai Lei

We study the community question answering (CQA) problem that emerges with the advent of numerous community forums in the recent past. The task of finding appropriate answers to questions from informative but noisy crowdsourced answers is important yet challenging in practice. We present an Attentive User-engaged Adversarial Neural Network (AUANN), which interactively learns the context information of questions and answers, and enhances user engagement with the CQA task. A novel attentive mechanism is incorporated to model the semantic internal and external relations among questions, answers and user contexts. To handle the noise issue caused by introducing user context, we design a two-step denoise mechanism, including a coarse-grained selection process by similarity measurement, and a fine-grained selection process by applying an adversarial training module. We evaluate the proposed method on large-scale real-world datasets SemEval-2016 and SemEval-2017. Experimental results verify the benefits of incorporating user information, and show that our proposed model significantly outperforms the stateof-the-art methods.

AAAI Conference 2020 Conference Paper

Be Relevant, Non-Redundant, and Timely: Deep Reinforcement Learning for Real-Time Event Summarization

  • Min Yang
  • Chengming Li
  • Fei Sun
  • Zhou Zhao
  • Ying Shen
  • Chenglin Wu

Real-time event summarization is an essential task in natural language processing and information retrieval areas. Despite the progress of previous work, generating relevant, nonredundant, and timely event summaries remains challenging in practice. In this paper, we propose a Deep Reinforcement learning framework for real-time Event Summarization (DRES), which shows promising performance for resolving all three challenges (i. e. , relevance, non-redundancy, timeliness) in a unified framework. Specifically, we (i) devise a hierarchical cross-attention network with intra- and interdocument attentions to integrate important semantic features within and between the query and input document for better text matching. In addition, relevance prediction is leveraged as an auxiliary task to strengthen the document modeling and help to extract relevant documents; (ii) propose a multi-topic dynamic memory network to capture the sequential patterns of different topics belonging to the event of interest and temporally memorize the input facts from the evolving document stream, avoiding extracting redundant information at each time step; (iii) consider both historical dependencies and future uncertainty of the document stream for generating relevant and timely summaries by exploiting the reinforcement learning technique. Experimental results on two realworld datasets have demonstrated the advantages of DRES model with significant improvement in generating relevant, non-redundant, and timely event summaries against the stateof-the-arts.

AAAI Conference 2020 Conference Paper

Convolutional Hierarchical Attention Network for Query-Focused Video Summarization

  • Shuwen Xiao
  • Zhou Zhao
  • Zijian Zhang
  • Xiaohui Yan
  • Min Yang

Previous approaches for video summarization mainly concentrate on finding the most diverse and representative visual contents as video summary without considering the user’s preference. This paper addresses the task of query-focused video summarization, which takes user’s query and a long video as inputs and aims to generate a query-focused video summary. In this paper, we consider the task as a problem of computing similarity between video shots and query. To this end, we propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module. In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot. The encoded features will be sent to query-relevance computing module to generate queryfocused video summary. Extensive experiments on the benchmark dataset demonstrate the competitive performance and show the effectiveness of our approach.

AAAI Conference 2020 Conference Paper

Improving Knowledge-Aware Dialogue Generation via Knowledge Base Question Answering

  • Jian Wang
  • Junhao Liu
  • Wei Bi
  • Xiaojiang Liu
  • Kejing He
  • Ruifeng Xu
  • Min Yang

Neural network models usually suffer from the challenge of incorporating commonsense knowledge into the opendomain dialogue systems. In this paper, we propose a novel knowledge-aware dialogue generation model (called TransDG), which transfers question representation and knowledge matching abilities from knowledge base question answering (KBQA) task to facilitate the utterance understanding and factual knowledge selection for dialogue generation. In addition, we propose a response guiding attention and a multi-step decoding strategy to steer our model to focus on relevant features for response generation. Experiments on two benchmark datasets demonstrate that our model has robust superiority over compared methods in generating informative and fluent dialogues. Our code is available at https: //github. com/siat-nlp/TransDG.

AAAI Conference 2020 Conference Paper

Improving the Robustness of Wasserstein Embedding by Adversarial PAC-Bayesian Learning

  • Daizong Ding
  • Mi Zhang
  • Xudong Pan
  • Min Yang
  • Xiangnan He

Node embedding is a crucial task in graph analysis. Recently, several methods are proposed to embed a node as a distribution rather than a vector to capture more information. Although these methods achieved noticeable improvements, their extra complexity brings new challenges. For example, the learned representations of nodes could be sensitive to external noises on the graph and vulnerable to adversarial behaviors. In this paper, we first derive an upper bound on generalization error for Wasserstein embedding via the PAC- Bayesian theory. Based on this, we propose an algorithm called Adversarial PAC-Bayesian Learning (APBL) in order to minimize the generalization error bound. Furthermore, we provide a model called Regularized Adversarial Wasserstein Embedding Network (RAWEN) as an implementation of APBL. Besides our comprehensive analysis of the robustness of RAWEN, our work for the first time explores more kinds of embedded distributions. For evaluations, we conduct extensive experiments to demonstrate the effectiveness and robustness of our proposed embedding model compared with the state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Interactive Dual Generative Adversarial Networks for Image Captioning

  • Junhao Liu
  • Kai Wang
  • Chunpu Xu
  • Zhou Zhao
  • Ruifeng Xu
  • Ying Shen
  • Min Yang

Image captioning is usually built on either generationbased or retrieval-based approaches. Both ways have certain strengths but suffer from their own limitations. In this paper, we propose an Interactive Dual Generative Adversarial Network (IDGAN) for image captioning, which mutually combines the retrieval-based and generation-based methods to learn a better image captioning ensemble. IDGAN consists of two generators and two discriminators, where the generation- and retrieval-based generators mutually benefit from each other’s complementary targets that are learned from two dual adversarial discriminators. Specifically, the generation- and retrieval-based generators provide improved synthetic and retrieved candidate captions with informative feedback signals from the two respective discriminators that are trained to distinguish the generated captions from the true captions and assign top rankings to true captions respectively, thus featuring the merits of both retrieval-based and generation-based approaches. Extensive experiments on MSCOCO dataset demonstrate that the proposed IDGAN model significantly outperforms the compared methods for image captioning.

AAAI Conference 2020 Conference Paper

Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering

  • Yang Deng
  • Wai Lam
  • Yuexiang Xie
  • Daoyuan Chen
  • Yaliang Li
  • Min Yang
  • Ying Shen

Community question answering (CQA) gains increasing popularity in both academy and industry recently. However, the redundancy and lengthiness issues of crowdsourced answers limit the performance of answer selection and lead to reading difficulties and misunderstandings for community users. To solve these problems, we tackle the tasks of answer selection and answer summary generation in CQA with a novel joint learning model. Specifically, we design a question-driven pointer-generator network, which exploits the correlation information between question-answer pairs to aid in attending the essential information when generating answer summaries. Meanwhile, we leverage the answer summaries to alleviate noise in original lengthy answers when ranking the relevancy degrees of question-answer pairs. In addition, we construct a new large-scale CQA corpus, WikiHowQA, which contains long answers for answer selection as well as reference summaries for answer summarization. The experimental results show that the joint learning method can effectively address the answer redundancy issue in CQA and achieves state-ofthe-art results on both answer selection and text summarization tasks. Furthermore, the proposed model is shown to be of great transferring ability and applicability for resource-poor CQA tasks, which lack of reference answer summaries.

AAAI Conference 2019 Conference Paper

A Human-Like Semantic Cognition Network for Aspect-Level Sentiment Classification

  • Zeyang Lei
  • Yujiu Yang
  • Min Yang
  • Wei Zhao
  • Jun Guo
  • Yi Liu

In this paper, we propose a novel Human-like Semantic Cognition Network (HSCN) for aspect-level sentiment classification, motivated by the principles of human beings’ reading cognitive process (pre-reading, active reading, post-reading). We first design a word-level interactive perception module to capture the correlation between context words and the given target words, which can be regarded as pre-reading. Second, to mimic the process of active reading, we propose a targetaware semantic distillation module to produce the targetspecific context representation for aspect-level sentiment prediction. Third, we further devise a semantic deviation metric module to measure the semantic deviation between the targetspecific context representation and the given target, which evaluates the degree we understand the target-specific context semantics. The measured semantic deviation is then used to fine-tune the above active reading process in a feedback regulation way. To verify the effectiveness of our approach, we conduct extensive experiments on three widely used datasets. The experiments demonstrate that HSCN achieves impressive results compared to other strong competitors.

AAAI Conference 2019 Short Paper

A Multi-Task Learning Approach for Answer Selection: A Study and a Chinese Law Dataset

  • Wenyu Du
  • Baocheng Li
  • Min Yang
  • Qiang Qu
  • Ying Shen

In this paper, we propose a Multi-Task learning approach for Answer Selection (MTAS), motivated by the fact that humans have no difficulty performing such task because they possess capabilities of multiple domains (tasks). Specifically, MTAS consists of two key components: (i) A category classification model that learns rich category-aware document representation; (ii) An answer selection model that provides the matching scores of question-answer pairs. These two tasks work on a shared document encoding layer, and they cooperate to learn a high-quality answer selection system. In addition, a multi-head attention mechanism is proposed to learn important information from different representation subspaces at different positions. We manually annotate the first Chinese question answering dataset in law domain (denoted as LawQA) to evaluate the effectiveness of our model. The experimental results show that our model MTAS consistently outperforms the compared methods. 1

AAAI Conference 2019 Short Paper

A Multi-Task Learning Framework for Abstractive Text Summarization

  • Yao Lu
  • Linqing Liu
  • Zhile Jiang
  • Min Yang
  • Randy Goebel

We propose a Multi-task learning approach for Abstractive Text Summarization (MATS), motivated by the fact that humans have no difficulty performing such task because they have the capabilities of multiple domains. Specifically, MATS consists of three components: (i) a text categorization model that learns rich category-specific text representations using a bi-LSTM encoder; (ii) a syntax labeling model that learns to improve the syntax-aware LSTM decoder; and (iii) an abstractive text summarization model that shares its encoder and decoder with the text categorization and the syntax labeling tasks, respectively. In particular, the abstractive text summarization model enjoys significant benefit from the additional text categorization and syntax knowledge. Our experimental results show that MATS outperforms the competitors. 1

AAAI Conference 2019 Conference Paper

Exploring Human-Like Reading Strategy for Abstractive Text Summarization

  • Min Yang
  • Qiang Qu
  • Wenting Tu
  • Ying Shen
  • Zhou Zhao
  • Xiaojun Chen

The recent artificial intelligence studies have witnessed great interest in abstractive text summarization. Although remarkable progress has been made by deep neural network based methods, generating plausible and high-quality abstractive summaries remains a challenging task. The human-like reading strategy is rarely explored in abstractive text summarization, which however is able to improve the effectiveness of the summarization by considering the process of reading comprehension and logical thinking. Motivated by the humanlike reading strategy that follows a hierarchical routine, we propose a novel Hybrid learning model for Abstractive Text Summarization (HATS). The model consists of three major components, a knowledge-based attention network, a multitask encoder-decoder network, and a generative adversarial network, which are consistent with the different stages of the human-like reading strategy. To verify the effectiveness of HATS, we conduct extensive experiments on two real-life datasets, CNN/Daily Mail and Gigaword datasets. The experimental results demonstrate that HATS achieves impressive results on both datasets.

IJCAI Conference 2019 Conference Paper

Knowledge-enhanced Hierarchical Attention for Community Question Answering with Multi-task and Adaptive Learning

  • Min Yang
  • Lei Chen
  • Xiaojun Chen
  • Qingyao Wu
  • Wei Zhou
  • Ying Shen

In this paper, we propose a Knowledge-enhanced Hierarchical Attention for community question answering with Multi-task learning and Adaptive learning (KHAMA). First, we propose a hierarchical attention network to fully fuse knowledge from input documents and knowledge base (KB) by exploiting the semantic compositionality of the input sequences. The external factual knowledge helps recognize background knowledge (entity mentions and their relationships) and eliminate noise information from long documents that have sophisticated syntactic and semantic structures. In addition, we build multiple CQA models with adaptive boosting and then combine these models to learn a more effective and robust CQA system. Further- more, KHAMA is a multi-task learning model. It regards CQA as the primary task and question categorization as the auxiliary task, aiming at learning a category-aware document encoder and enhance the quality of identifying essential information from long questions. Extensive experiments on two benchmarks demonstrate that KHAMA achieves substantial improvements over the compared methods.

AAAI Conference 2019 Short Paper

Learning Document Embeddings with Crossword Prediction

  • Junyu Luo
  • Min Yang
  • Ying Shen
  • Qiang Qu
  • Haixia Chai

In this paper, we propose a Document Embedding Network (DEN) to learn document embeddings in an unsupervised manner. Our model uses the encoder-decoder architecture as its backbone, which tries to reconstruct the input document from an encoded document embedding. Unlike the standard decoder for text reconstruction, we randomly block some words in the input document, and use the incomplete context information and the encoded document embedding to predict the blocked words in the document, inspired by the crossword game. Thus, our decoder can keep the balance between the known and unknown information, and consider both global and partial information when decoding the missing words. We evaluate the learned document embeddings on two tasks: document classification and document retrieval. The experimental results show that our model substantially outperforms the compared methods. 1.

AAAI Conference 2019 Conference Paper

Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering

  • Yang Deng
  • Yuexiang Xie
  • Yaliang Li
  • Min Yang
  • Nan Du
  • Wei Fan
  • Kai Lei
  • Ying Shen

Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives.

AAAI Conference 2019 Short Paper

Transductive Zero-Shot Learning via Visual Center Adaptation

  • Ziyu Wan
  • Yan Li
  • Min Yang
  • Junge Zhang

In this paper, we propose a Visual Center Adaptation Method (VCAM) to address the domain shift problem in zero-shot learning. For the seen classes in the training data, VCAM builds an embedding space by learning the mapping from semantic space to some visual centers. While for unseen classes in the test data, the construction of embedding space is constrained by a symmetric Chamfer-distance term, aiming to adapt the distribution of the synthetic visual centers to that of the real cluster centers. Therefore the learned embedding space can generalize the unseen classes well. Experiments on two widely used datasets demonstrate that our model significantly outperforms state-of-the-art methods.

IJCAI Conference 2018 Conference Paper

A Multi-task Learning Approach for Image Captioning

  • Wei Zhao
  • Benyou Wang
  • Jianbo Ye
  • Min Yang
  • Zhou Zhao
  • Ruotian Luo
  • Yu Qiao

In this paper, we propose a Multi-task Learning Approach for Image Captioning (MLAIC ), motivated by the fact that humans have no difficulty performing such task because they possess capabilities of multiple domains. Specifically, MLAIC consists of three key components: (i) A multi-object classification model that learns rich category-aware image representations using a CNN image encoder; (ii) A syntax generation model that learns better syntax-aware LSTM based decoder; (iii) An image captioning model that generates image descriptions in text, sharing its CNN encoder and LSTM decoder with the object classification task and the syntax generation task, respectively. In particular, the image captioning model can benefit from the additional object categorization and syntax knowledge. To verify the effectiveness of our approach, we conduct extensive experiments on MS-COCO dataset. The experimental results demonstrate that our model achieves impressive results compared to other strong competitors.

AAAI Conference 2018 Short Paper

A Semi-Supervised Network Embedding Model for Protein Complexes Detection

  • Wei Zhao
  • Jia Zhu
  • Min Yang
  • Danyang Xiao
  • Gabriel Pui Cheong Fung
  • Xiaojun Chen

Protein complex is a group of associated polypeptide chains which plays essential roles in biological process. Given a graph representing protein-protein interactions (PPI) network, it is critical but non-trivial to detect protein complexes. In this paper, we propose a semi-supervised network embedding model by adopting graph convolutional networks to effectively detect densely connected subgraphs. We conduct extensive experiment on two popular PPI networks with various data sizes and densities. The experimental results show our approach achieves state-of-the-art performance.

IJCAI Conference 2018 Conference Paper

Attentional Image Retweet Modeling via Multi-Faceted Ranking Network Learning

  • Zhou Zhao
  • Lingtao Meng
  • Jun Xiao
  • Min Yang
  • Fei Wu
  • Deng Cai
  • Xiaofei He
  • Yueting Zhuang

Retweet prediction is a challenging problem in social media sites (SMS). In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees. Unlike previous studies, we learn user preference ranking model from their past retweeted image tweets in SMS. We first propose heterogeneous image retweet modeling network (IRM) that exploits users' past retweeted image tweets with associated contexts, their following relations in SMS and preference of their followees. We then develop a novel attentional multi-faceted ranking network learning framework with multi-modal neural networks for the proposed heterogenous IRM network to learn the joint image tweet representations and user preference representations for prediction task. The extensive experiments on a large-scale dataset from Twitter site shows that our method achieves better performance than other state-of-the-art solutions to the problem.

AAAI Conference 2018 Short Paper

Generative Adversarial Network for Abstractive Text Summarization

  • Linqing Liu
  • Yao Lu
  • Min Yang
  • Qiang Qu
  • Jia Zhu
  • Hongyan Li

In this paper, we propose an adversarial process for abstractive text summarization, in which we simultaneously train a generative model G and a discriminative model D. In particular, we build the generator G as an agent of reinforcement learning, which takes the raw text as input and predicts the abstractive summarization. We also build a discriminator which attempts to distinguish the generated summary from the ground truth summary. Extensive experiments demonstrate that our model achieves competitive ROUGE scores with the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show that our model is able to generate more abstractive, readable and diverse summaries1.

IJCAI Conference 2018 Conference Paper

PLASTIC: Prioritize Long and Short-term Information in Top-n Recommendation using Adversarial Training

  • Wei Zhao
  • Benyou Wang
  • Jianbo Ye
  • Yongqiang Gao
  • Min Yang
  • Xiaojun Chen

Recommender systems provide users with ranked lists of items based on individual's preferences and constraints. Two types of models are commonly used to generate ranking results: long-term models and session-based models. While long-term models represent the interactions between users and items that are supposed to change slowly across time, session-based models encode the information of users' interests and changing dynamics of items' attributes in short terms. In this paper, we propose a PLASTIC model, Prioritizing Long And Short-Term Information in top-n reCommendation using adversarial training. In the adversarial process, we train a generator as an agent of reinforcement learning which recommends the next item to a user sequentially. We also train a discriminator which attempts to distinguish the generated list of items from the real list recorded. Extensive experiments show that our model exhibits significantly better performances on two widely used real-world datasets.

AAAI Conference 2018 Short Paper

Sentiment Lexicon Enhanced Attention-Based LSTM for Sentiment Classification

  • Zeyang Lei
  • Yujiu Yang
  • Min Yang

Deep neural networks have gained great success recently for sentiment classification. However, these approaches do not fully exploit the linguistic knowledge. In this paper, we propose a novel sentiment lexicon enhanced attention-based LSTM (SLEA-LSTM) model to improve the performance of sentence-level sentiment classification. Our method successfully integrates sentiment lexicon into deep neural networks via single-head or multi-head attention mechanisms. We conduct extensive experiments on MR and SST datasets. The experimental results show that our model achieved comparable or better performance than the state-of-the-art methods.

AAAI Conference 2017 Short Paper

Attention Based LSTM for Target Dependent Sentiment Classification

  • Min Yang
  • Wenting Tu
  • Jingxuan Wang
  • Fei Xu
  • Xiaojun Chen

We present an attention-based bidirectional LSTM approach to improve the target-dependent sentiment classification. Our method learns the alignment between the target entities and the most distinguishing features. We conduct extensive experiments on a real-life dataset. The experimental results show that our model achieves state-of-the-art results.

AAAI Conference 2017 Short Paper

Authorship Attribution with Topic Drift Model

  • Min Yang
  • Dingju Zhu
  • Yong Tang
  • Jingxuan Wang

Detecting authorship attribution is an active research direction due to its legal and financial importance. The goal is to identify the authorship of anonymous texts. In this paper, we propose a Topic Drift Model (TDM), monitoring the dynamicity of authors’ writing style and latent topics of interest. Our model is sensitive to the temporal information and the ordering of words, thus it extracts more information from texts.

AAAI Conference 2017 Short Paper

Detecting Review Spammer Groups

  • Min Yang
  • Ziyu Lu
  • Xiaojun Chen
  • Fei Xu

With an increasing number of paid writers posting fake reviews to promote or demote some target entities through Internet, review spammer detection has become a crucial and challenging task. In this paper, we propose a three-phase method to address the problem of identifying review spammer groups and individual spammers, who get paid for posting fake comments. We evaluate the effectiveness and performance of the approach on a real-life online shopping review dataset from amazon. com. The experimental result shows that our model achieved comparable or better performance than previous work on spammer detection.

IJCAI Conference 2017 Conference Paper

Scalable Normalized Cut with Improved Spectral Rotation

  • Xiaojun Chen
  • Feiping Nie
  • Joshua Zhexue Huang
  • Min Yang

Many spectral clustering algorithms have been proposed and successfully applied to many high-dimensional applications. However, there are still two problems that need to be solved: 1) existing methods for obtaining the final clustering assignments may deviate from the true discrete solution, and 2) most of these methods usually have very high computational complexity. In this paper, we propose a Scalable Normalized Cut method for clustering of large scale data. In the new method, an efficient method is used to construct a small representation matrix and then clustering is performed on the representation matrix. In the clustering process, an improved spectral rotation method is proposed to obtain the solution of the final clustering assignments. A series of experimental were conducted on 14 benchmark data sets and the experimental results show the superior performance of the new method.

AAAI Conference 2015 Conference Paper

Ordering-Sensitive and Semantic-Aware Topic Modeling

  • Min Yang
  • Tianyi Cui
  • Wenting Tu

Topic modeling of textual corpora is an important and challenging problem. In most previous work, the “bag-of-words” assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly. Extensive experiments show that our model can learn better topics and more accurate word distributions for each topic. Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM obtains significantly better performance in terms of perplexity, retrieval accuracy and classification accuracy.

NeurIPS Conference 2010 Conference Paper

Relaxed Clipping: A Global Training Method for Robust Regression and Classification

  • Min Yang
  • Linli Xu
  • Martha White
  • Dale Schuurmans
  • Yao-Liang Yu

Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of ``loss clipping'' can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.