Arrow Research search

Author name cluster

Fei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers
2 author rows

Possible papers

40

AAAI Conference 2026 Conference Paper

Generating Attribute-Aware Human Motions from Textual Prompt

  • Xinghan Wang
  • Kun Xu
  • Fei Li
  • Cao Sheng
  • JiaZhong Yu
  • Yadong Mu

Text-driven human motion generation has recently attracted considerable attention, allowing models to generate human motions based on textual descriptions. However, current methods neglect the influence of human attributes—such as age, gender, weight, and height—which are key factors shaping human motion patterns. This work represents a pilot exploration for bridging this gap. We conceptualize each motion as comprising both attribute information and action semantics, where textual descriptions align exclusively with action semantics. To achieve this, a new framework inspired by Structural Causal Models is proposed to decouple action semantics from human attributes, enabling text-to-semantics prediction and attribute-controlled generation. The resulting model is capable of generating attribute-aware motion aligned with the user's text and attribute inputs. For evaluation, we introduce a comprehensive dataset containing attribute annotations for text-motion pairs, setting the first benchmark for attribute-aware motion generation. Extensive experiments validate our model's effectiveness.

AAAI Conference 2026 Conference Paper

Interest-Shift-Aware Logical Reasoning for Efficient Long-Sequence Recommendation

  • Fei Li
  • Qingyun Gao
  • Enneng Yang
  • Jianzhe Zhao
  • Guibing Guo

Logical reasoning-based recommendation methods formulate logical expressions to characterize user-item interaction patterns, incorporating regularization constraints to ensure consistency with logical rules. However, these methods face two critical challenges: (1) As sequence length increases, they cannot effectively capture the dynamic transfer of user interests across subsequences (i.e., subsequence interest drift), thereby degenerating logical expressions to single-subsequence inference. (2) The time complexity of logical reasoning and rule learning scales quadratically with the sequence length, severely constraining computational efficiency in long-sequence recommendation. To address these challenges, we propose ELECTOR, an intErest-shift-aware long-sequence Logical reasoning for EffiCienT lOng-sequence Recommendation method. Specifically, we design a Subsequence Interest Learning Module (SIL) to model cross-subsequence interest drifts in long sequences. SIL employs a local attention mechanism to extract subsequence interests effectively and a global attention mechanism to capture the correlations among subsequence interests. Subsequently, we propose an Interest-aware Logical Reasoning (ILR) mechanism that performs logical reasoning using a limited set of subsequence and short-term interests, rather than reasoning over the entire sequence, significantly reducing time complexity. Additionally, ILR employs interest logical reasoning contrastive loss to ensure the model simultaneously considers multiple interests. Experiments on four real-world datasets demonstrate that our method significantly outperforms all baselines regarding computational efficiency and recommendation accuracy, confirming its effectiveness.

AAAI Conference 2026 Conference Paper

KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache

  • Fei Li
  • Song Liu
  • Weiguo Wu
  • Shiqiang Nie
  • Jinyu Wang

The high memory demands of the Key-Value (KV) Cache during the inference of Large Language Models (LLMs) severely restrict their deployment in resource-constrained platforms. Quantization can effectively alleviate the memory pressure caused by KV Cache. However, existing methods either rely on static one-size-fits-all precision allocation or fail to dynamically prioritize critical KV in long-context tasks, forcing memory-accuracy-throughput tradeoffs. In this work, we propose a novel mixed-precision quantization method for KV Cache named KVmix. KVmix leverages gradient-based importance analysis to evaluate how individual Key and Value projection matrices affect the model loss, enabling layer-specific bit-width allocation for mix-precision quantization. It dynamically prioritizes higher precision for important layers while aggressively quantizing less influential ones, achieving a tunable balance between accuracy and efficiency. KVmix introduces a dynamic long-context optimization strategy that adaptively keeps full-precision KV pairs for recent pivotal tokens and compresses older ones, achieving high-quality sequence generation with low memory usage. Additionally, KVmix provides efficient low-bit quantization and CUDA kernels to optimize computational overhead. On LLMs such as Llama and Mistral, KVmix achieves near-lossless inference performance with extremely low quantization configuration (Key 2.19bit Value 2.38bit), while delivering a remarkable 4.9× memory compression and a 5.3× speedup in inference throughput.

AAAI Conference 2026 Conference Paper

PaSE: Prototype-aligned Calibration and Shapley-based Equilibrium for Multimodal Sentiment Analysis

  • Kang He
  • Boyu Chen
  • Yuzhe Ding
  • Fei Li
  • Chong Teng
  • Donghong Ji

Multimodal Sentiment Analysis (MSA) seeks to understand human emotions by integrating textual, acoustic, and visual signals. Although multimodal fusion is designed to leverage cross-modal complementarity, real-world scenarios often exhibit modality competition: dominant modalities tend to overshadow weaker ones, leading to suboptimal performance. In this paper, we propose PaSE, a novel Prototype-aligned Calibration and Shapley-optimized Equilibrium framework, which enhances collaboration while explicitly mitigating modality competition. PaSE first applies Prototype-guided Calibration Learning (PCL) to refine unimodal representations and align them through an Entropic Optimal Transport mechanism that ensures semantic consistency. To further stabilize optimization, we introduce a Dual-Phase Optimization strategy. A prototype-gated fusion module is first used to extract shared representations, followed by Shapley-based Gradient Modulation (SGM), which adaptively adjusts gradients according to the contribution of each modality. Extensive experiments on IEMOCAP, MOSI, and MOSEI confirm that PaSE achieves the superior performance and effectively alleviates modality competition.

NeurIPS Conference 2025 Conference Paper

EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions

  • Xiaorui Wu
  • Fei Li
  • Xiaofeng Mao
  • Xin Zhang
  • Li Zheng
  • Yuxiang Peng
  • Chong Teng
  • Donghong Ji

Large language models (LLMs) frequently refuse to respond to pseudo-malicious instructions: semantically harmless input queries triggering unnecessary LLM refusals due to conservative safety alignment, significantly impairing user experience. Collecting such instructions is crucial for evaluating and mitigating over-refusals, but existing instruction curation methods, like manual creation or instruction rewriting, either lack scalability or fail to produce sufficiently diverse and effective refusal-inducing prompts. To address these limitations, we introduce EVOREFUSE, a prompt optimization approach that generates diverse pseudo-malicious instructions consistently eliciting confident refusals across LLMs. EVOREFUSE employs an evolutionary algorithm exploring the instruction space in more diverse directions than existing methods via mutation strategies and recombination, and iteratively evolves seed instructions to maximize evidence lower bound on LLM refusal probability. Using EVOREFUSE, we create two novel datasets: EVOREFUSE-TEST, a benchmark of 582 pseudo-malicious instructions that outperforms the next-best benchmark with 85. 34% higher average refusal triggering rate across 9 LLMs without a safety-prior system prompt, 34. 86% greater lexical diversity, and 40. 03% improved LLM response confidence scores; and EVOREFUSE-ALIGN, which provides 3, 000 pseudo-malicious instructions with responses for supervised and preference-based alignment training. With supervised fine-tuning on EVOREFUSE-ALIGN, LLAMA3. 1-8B-INSTRUCT achieves up to 29. 85% fewer over-refusals than models trained on the second-best alignment dataset, without compromising safety. Our analysis with EVOREFUSE-TEST reveals models trigger over-refusals by overly focusing on sensitive keywords while ignoring broader context. Our code and datasets are available at https: //github. com/FishT0ucher/EVOREFUSE.

AAAI Conference 2025 Conference Paper

Multi-Granular Multimodal Clue Fusion for Meme Understanding

  • Li Zheng
  • Hao Fei
  • Ting Dai
  • Zuquan Peng
  • Fei Li
  • Huisheng Ma
  • Chong Teng
  • Donghong Ji

With the continuous emergence of various social media platforms frequently used in daily life, the multimodal meme understanding (MMU) task has been garnering increasing attention. MMU aims to explore and comprehend the meanings of memes from various perspectives by performing tasks such as metaphor recognition, sentiment analysis, intention detection, and offensiveness detection. Despite making progress, limitations persist due to the loss of fine-grained metaphorical visual clue and the neglect of multimodal text-image weak correlation. To overcome these limitations, we propose a multi-granular multimodal clue fusion model (MGMCF) to advance MMU. Firstly, we design an object-level semantic mining module to extract object-level image feature clues, achieving fine-grained feature clue extraction and enhancing the model's ability to capture metaphorical details and semantics. Secondly, we propose a brand-new global-local cross-modal interaction model to address the weak correlation between text and images. This model facilitates effective interaction between global multimodal contextual clues and local unimodal feature clues, strengthening their representations through a bidirectional cross-modal attention mechanism. Finally, we devise a dual-semantic guided training strategy to enhance the model's understanding and alignment of multimodal representations in the semantic space. Experiments conducted on the widely-used MET-MEME bilingual dataset demonstrate significant improvements over state-of-the-art baselines. Specifically, there is an 8.14% increase in precision for offensiveness detection task, and respective accuracy enhancements of 3.53%, 3.89%, and 3.52% for metaphor recognition, sentiment analysis, and intention detection tasks. These results, underpinned by in-depth analyses, underscore the effectiveness and potential of our approach for advancing MMU.

AAAI Conference 2024 Conference Paper

Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach

  • Yuyang Chai
  • Zhuang Li
  • Jiahui Liu
  • Lei Chen
  • Fei Li
  • Donghong Ji
  • Chong Teng

Despite significant advancements in multi-label text classification, the ability of existing models to generalize to novel and seldom-encountered complex concepts, which are compositions of elementary ones, remains underexplored. This research addresses this gap. By creating unique data splits across three benchmarks, we assess the compositional generalization ability of existing multi-label text classification models. Our results show that these models often fail to generalize to compositional concepts encountered infrequently during training, leading to inferior performance on tests with these new combinations. To address this, we introduce a data augmentation method that leverages two innovative text generation models designed to enhance the classification models' capacity for compositional generalization. Our experiments show that this data augmentation approach significantly improves the compositional generalization capabilities of classification models on our benchmarks, with both generation models surpassing other text generation baselines. Our codes available at https://github.com/yychai74/LD-VAE.

AAAI Conference 2024 Conference Paper

Harnessing Holistic Discourse Features and Triadic Interaction for Sentiment Quadruple Extraction in Dialogues

  • Bobo Li
  • Hao Fei
  • Lizi Liao
  • Yu Zhao
  • Fangfang Su
  • Fei Li
  • Donghong Ji

Dialogue Aspect-based Sentiment Quadruple (DiaASQ) is a newly-emergent task aiming to extract the sentiment quadruple (i.e., targets, aspects, opinions, and sentiments) from conversations. While showing promising performance, the prior DiaASQ approach unfortunately falls prey to the key crux of DiaASQ, including insufficient modeling of discourse features, and lacking quadruple extraction, which hinders further task improvement. To this end, we introduce a novel framework that not only capitalizes on comprehensive discourse feature modeling, but also captures the intrinsic interaction for optimal quadruple extraction. On the one hand, drawing upon multiple discourse features, our approach constructs a token-level heterogeneous graph and enhances token interactions through a heterogeneous attention network. We further propose a novel triadic scorer, strengthening weak token relations within a quadruple, thereby enhancing the cohesion of the quadruple extraction. Experimental results on the DiaASQ benchmark showcase that our model significantly outperforms existing baselines across both English and Chinese datasets. Our code is available at https://bit.ly/3v27pqA.

JBHI Journal 2024 Journal Article

Improving Tumor Classification by Reusing Self-Predicted Segmentation of Medical Images as Guiding Knowledge

  • Xiaoyi Lin
  • Mingyu Wang
  • Fei Li
  • Ziyue Xu
  • Jia Chen
  • Xin Chen
  • Chenglang Yuan
  • Songxiong Wu

Differential diagnosis of tumors is important for computer-aided diagnosis. In computer-aided diagnosis systems, expert knowledge of lesion segmentation masks is limited as it is only used during preprocessing or as supervision to guide feature extraction. To improve the utilization of lesion segmentation masks, this study proposes a simple and effective multitask learning network that improves medical image classification using self-predicted segmentation as guiding knowledge; we call this network RS $^{2}$ -net. In RS $^{2}$ -net, the predicted segmentation probability map obtained from the initial segmentation inference is added to the original image to form a new input, which is then reinput to the network for the final classification inference. We validated the proposed RS $^{2}$ -net using three datasets: the pNENs-Grade dataset, which tested the prediction of pancreatic neuroendocrine neoplasm grading, and the HCC-MVI dataset, which tested the prediction of microvascular invasion of hepatocellular carcinoma, and ISIC 2017 public skin lesion dataset. The experimental results indicate that the proposed strategy of reusing self-predicted segmentation is effective, and RS $^{2}$ -net outperforms other popular networks and existing state-of-the-art studies. Interpretive analytics based on feature visualization demonstrates that the improved classification performance of our reuse strategy is due to the semantic information that can be acquired in advance in a shallow network.

AAAI Conference 2024 Conference Paper

MindMap: Constructing Evidence Chains for Multi-Step Reasoning in Large Language Models

  • Yangyu Wu
  • Xu Han
  • Wei Song
  • Miaomiao Cheng
  • Fei Li

Large language models (LLMs) have demonstrated remarkable performance in various natural language processing tasks. However, they still face significant challenges in automated reasoning, particularly in scenarios involving multi-step reasoning. In this paper, we focus on the logical reasoning problem. The main task is to answer a question based on a set of available facts and rules. A lot of work has focused on guiding LLMs to think logically by generating reasoning paths, ignoring the structure among available facts. In this paper, we propose a simple approach MindMap by introducing evidence chains for supporting reasoning. An evidence chain refers to a set of facts that involve the same subject. In this way, we can organize related facts together to avoid missing important information. MindMap can be integrated with existing reasoning framework, such as Chain-of-Thought (CoT) and Selection-Inference (SI), by letting the model select relevant evidence chains instead of independent facts. The experimental results on the bAbI and ProofWriterOWA datasets demonstrate the effectiveness of MindMap.It can significantly improve CoT and SI, especially in multi-step reasoning tasks.

AAAI Conference 2024 Conference Paper

Reverse Multi-Choice Dialogue Commonsense Inference with Graph-of-Thought

  • Li Zheng
  • Hao Fei
  • Fei Li
  • Bobo Li
  • Lizi Liao
  • Donghong Ji
  • Chong Teng

With the proliferation of dialogic data across the Internet, the Dialogue Commonsense Multi-choice Question Answering (DC-MCQ) task has emerged as a response to the challenge of comprehending user queries and intentions. Although prevailing methodologies exhibit effectiveness in addressing single-choice questions, they encounter difficulties in handling multi-choice queries due to the heightened intricacy and informational density. In this paper, inspired by the human cognitive process of progressively excluding options, we propose a three-step Reverse Exclusion Graph-of-Thought (ReX-GoT) framework, including Option Exclusion, Error Analysis, and Combine Information. Specifically, our ReX-GoT mimics human reasoning by gradually excluding irrelevant options and learning the reasons for option errors to choose the optimal path of the GoT and ultimately infer the correct answer. By progressively integrating intricate clues, our method effectively reduces the difficulty of multi-choice reasoning and provides a novel solution for DC-MCQ. Extensive experiments on the CICERO and CICERO_v2 datasets validate the significant improvement of our approach on DC-MCQ task. On zero-shot setting, our model outperform the best baseline by 17.67% in terms of F1 score for the multi-choice task. Most strikingly, our GPT3.5-based ReX-GoT framework achieves a remarkable 39.44% increase in F1 score.

AAAI Conference 2023 Conference Paper

Dialogue State Distillation Network with Inter-slot Contrastive Learning for Dialogue State Tracking

  • Jing Xu
  • Dandan Song
  • Chong Liu
  • Siu Cheung Hui
  • Fei Li
  • Qiang Ju
  • Xiaonan He
  • Jian Xie

In task-oriented dialogue systems, Dialogue State Tracking (DST) aims to extract users' intentions from the dialogue history. Currently, most existing approaches suffer from error propagation and are unable to dynamically select relevant information when utilizing previous dialogue states. Moreover, the relations between the updates of different slots provide vital clues for DST. However, the existing approaches rely only on predefined graphs to indirectly capture the relations. In this paper, we propose a Dialogue State Distillation Network (DSDN) to utilize relevant information of previous dialogue states and migrate the gap of utilization between training and testing. Thus, it can dynamically exploit previous dialogue states and avoid introducing error propagation simultaneously. Further, we propose an inter-slot contrastive learning loss to effectively capture the slot co-update relations from dialogue context. Experiments are conducted on the widely used MultiWOZ 2.0 and MultiWOZ 2.1 datasets. The experimental results show that our proposed model achieves the state-of-the-art performance for DST.

IJCAI Conference 2022 Conference Paper

Global Inference with Explicit Syntactic and Discourse Structures for Dialogue-Level Relation Extraction

  • Hao Fei
  • Jingye Li
  • Shengqiong Wu
  • Chenliang Li
  • Donghong Ji
  • Fei Li

Recent research attention for relation extraction has been paid to the dialogue scenario, i. e. , dialogue-level relation extraction (DiaRE). Existing DiaRE methods either simply concatenate the utterances in a dialogue into a long piece of text, or employ naive words, sentences or entities to build dialogue graphs, while the structural characteristics in dialogues have not been fully utilized. In this work, we investigate a novel dialogue-level mixed dependency graph (D2G) and an argument reasoning graph (ARG) for DiaRE with a global relation reasoning mechanism. First, we model the entire dialogue into a unified and coherent D2G by explicitly integrating both syntactic and discourse structures, which enables richer semantic and feature learning for relation extraction. Second, we stack an ARG graph on top of D2G to further focus on argument inter-dependency learning and argument representation refinement, for sufficient argument relation inference. In our global reasoning framework, D2G and ARG work collaboratively, iteratively performing lexical, syntactic and semantic information exchange and representation learning over the entire dialogue context. On two DiaRE benchmarks, our framework shows considerable improvements over the current state-of-the-art baselines. Further analyses show that the model effectively solves the long-range dependence issue, and meanwhile gives explainable predictions.

IJCAI Conference 2022 Conference Paper

Inheriting the Wisdom of Predecessors: A Multiplex Cascade Framework for Unified Aspect-based Sentiment Analysis

  • Hao Fei
  • Fei Li
  • Chenliang Li
  • Shengqiong Wu
  • Jingye Li
  • Donghong Ji

So far, aspect-based sentiment analysis (ABSA) has involved with total seven subtasks, in which, however the interactions among them have been left unexplored sufficiently. This work presents a novel multiplex cascade framework for unified ABSA and maintaining such interactions. First, we model total seven subtasks as a hierarchical dependency in the easy-to-hard order, based on which we then propose a multiplex decoding mechanism, transferring the sentiment layouts and clues in lower tasks to upper ones. The multiplex strategy enables highly-efficient subtask interflows and avoids repetitive training; meanwhile it sufficiently utilizes the existing data without requiring any further annotation. Further, based on the characteristics of aspect-opinion term extraction and pairing, we enhance our multiplex framework by integrating POS tag and syntactic dependency information for term boundary and pairing identification. The proposed Syntax-aware Multiplex (SyMux) framework enhances the ABSA performances on 28 subtasks (7×4 datasets) with big margins.

NeurIPS Conference 2022 Conference Paper

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

  • Hao Fei
  • Shengqiong Wu
  • Jingye Li
  • Bobo Li
  • Fei Li
  • Libo Qin
  • Meishan Zhang
  • Min Zhang

Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE. A heterogeneous structure inductor is explored to unsupervisedly induce rich heterogeneous structural representations by post-training an existing GLM. In particular, a structural broadcaster is devised to compact various latent trees into explicit high-order forests, helping to guide a better generation during decoding. We finally introduce a task-oriented structure fine-tuning mechanism, further adjusting the learned structures to most coincide with the end-task's need. Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying.

AAAI Conference 2022 Conference Paper

Mastering the Explicit Opinion-Role Interaction: Syntax-Aided Neural Transition System for Unified Opinion Role Labeling

  • Shengqiong Wu
  • Hao Fei
  • Fei Li
  • Meishan Zhang
  • Yijiang Liu
  • Chong Teng
  • Donghong Ji

Unified opinion role labeling (ORL) aims to detect all possible opinion structures of ‘opinion-holder-target’ in one shot, given a text. The existing transition-based unified method, unfortunately, is subject to longer opinion terms and fails to solve the term overlap issue. Current top performance has been achieved by employing the span-based graph model, which however still suffers from both high model complexity and insufficient interaction among opinions and roles. In this work, we investigate a novel solution by revisiting the transition architecture, and augmenting it with a pointer network (PointNet). The framework parses out all opinion structures in linear-time complexity, meanwhile breaks through the limitation of any length of terms with PointNet. To achieve the explicit opinion-role interactions, we further propose a unified dependency-opinion graph (UDOG), co-modeling the syntactic dependency structure and the partial opinion-role structure. We then devise a relation-centered graph aggregator (RCGA) to encode the multi-relational UDOG, where the resulting high-order representations are used to promote the predictions in the vanilla transition system. Our model achieves new state-of-the-art results on the MPQA benchmark. Analyses further demonstrate the superiority of our methods on both efficacy and efficiency.

AAAI Conference 2022 Conference Paper

Unified Named Entity Recognition as Word-Word Relation Classification

  • Jingye Li
  • Hao Fei
  • Jiang Liu
  • Shengqiong Wu
  • Meishan Zhang
  • Chong Teng
  • Donghong Ji
  • Fei Li

So far, named entity recognition (NER) has been involved with three major types, including flat, overlapped (aka. nested), and discontinuous NER, which have mostly been studied individually. Recently, a growing interest has been built for unified NER, tackling the above three jobs concurrently with one single model. Current best-performing methods mainly include span-based and sequence-to-sequence models, where unfortunately the former merely focus on boundary identification and the latter may suffer from exposure bias. In this work, we present a novel alternative by modeling the unified NER as word-word relation classification, namely W2 NER. The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words with Next-Neighboring-Word (NNW) and Tail-Head-Word-* (THW-*) relations. Based on the W2 NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs. We then propose multi-granularity 2D convolutions for better refining the grid representations. Finally, a co-predictor is used to sufficiently reason the word-word relations. We perform extensive experiments on 14 widely-used benchmark datasets for flat, overlapped, and discontinuous NER (8 English and 6 Chinese datasets), where our model beats all the current top-performing baselines, pushing the state-of-the-art performances of unified NER.

AAAI Conference 2021 Conference Paper

Encoder-Decoder Based Unified Semantic Role Labeling with Label-Aware Syntax

  • Hao Fei
  • Fei Li
  • Bobo Li
  • Donghong Ji

Currently the unified semantic role labeling (SRL) that achieves predicate identification and argument role labeling in an end-to-end manner has received growing interests. Recent works show that leveraging the syntax knowledge significantly enhances the SRL performances. In this paper, we investigate a novel unified SRL framework based on the sequence-to-sequence architecture with double enhancement in both the encoder and decoder sides. In the encoder side, we propose a novel label-aware graph convolutional network (LA-GCN) to encode both the syntactic dependent arcs and labels into BERT-based word representations. In the decoder side, we creatively design a pointer-network-based model for detecting predicates, arguments and roles jointly. Our pointernet decoder is able to make decisions by consulting all the input elements in a global view, and meanwhile it is syntacticaware by incorporating the syntax information from LA- GCN. Besides, a high-order interacted attention is introduced into the decoder for leveraging previously recognized triplets to help the current decision. Empirical experiments show that our framework significantly outperforms all existing graphbased methods on the CoNLL09 and Universal Proposition Bank datasets. In-depth analysis demonstrates that our model can effectively capture the correlations between syntactic and SRL structures.

ICRA Conference 2021 Conference Paper

Open-set Intersection Intention Prediction for Autonomous Driving

  • Fei Li
  • Xiangxu Li
  • Jun Luo 0009
  • Shiwei Fan
  • Hongbo Zhang

Intention prediction is a crucial task for Autonomous Driving (AD). Due to the variety of size and layout of intersections, it is challenging to predict intention of human driver at different intersections, especially unseen and irregular intersections. In this paper, we formulate the prediction of intention at intersections as an open-set prediction problem that requires context specific matching of the target vehicle state and the diverse intersection configurations that are in principle unbounded. We capture map-centric features that correspond to intersection structures under a spatial-temporal graph representation, and use two MAAMs (mutually auxiliary attention module) that cover respectively lane-level and exit-level intentions to predict a target that best matches intersection elements in map-centric feature space. Under our model, attention scores estimate the probability distribution of the open-set intentions that are contextually defined by the structure of the current intersection. The proposed model is trained and evaluated on simulated dataset. Furthermore, the model, trained on simulated dataset and without any fine tuning, is directly validated on in-house real-world dataset collected at 98 real-world intersections and exhibits satisfactory performance, demonstrating the practical viability of our approach.

AAAI Conference 2021 Conference Paper

Rethinking Boundaries: End-To-End Recognition of Discontinuous Mentions with Pointer Networks

  • Hao Fei
  • Donghong Ji
  • Bobo Li
  • Yijiang Liu
  • Yafeng Ren
  • Fei Li

A majority of research interests in irregular (e. g. , nested or discontinuous) named entity recognition (NER) have been paid on nested entities, while discontinuous entities received limited attention. Existing work for discontinuous NER, however, either suffers from decoding ambiguity or predicting using token-level local features. In this work, we present an innovative model for discontinuous NER based on pointer networks, where the pointer simultaneously decides whether a token at each decoding frame constitutes an entity mention and where the next constituent token is. Our model has three major merits compared with previous work: (1) The pointer mechanism is memory-augmented, which enhances the mention boundary detection and interactions between the current decision and prior recognized mentions. (2) The encoderdecoder architecture can linearize the complexity of structure prediction, and thus reduce search costs. (3) The model makes every decision using global information, i. e. , by consulting all the input, encoder and previous decoder output in a global view. Experimental results on the CADEC and ShARe13 datasets show that our model outperforms flat and hypergraph models as well as a state-of-the-art transitionbased model for discontinuous NER. Further in-depth analysis demonstrates that our model performs well in recognizing various entities including flat, overlapping and discontinuous ones. More crucially, our model is effective on boundary detection, which is the kernel source to NER.

JBHI Journal 2020 Journal Article

Automatic Segmentation and Visualization of Choroid in OCT with Knowledge Infused Deep Learning

  • Huihong Zhang
  • Jianlong Yang
  • Kang Zhou
  • Fei Li
  • Yan Hu
  • Yitian Zhao
  • Ce Zheng
  • Xiulan Zhang

The choroid provides oxygen and nourishment to the outer retina thus is related to the pathology of various ocular diseases. Optical coherence tomography (OCT) is advantageous in visualizing and quantifying the choroid in vivo. However, its application in the study of the choroid is still limited for two reasons. (1) The lower boundary of the choroid (choroid-sclera interface) in OCT is fuzzy, which makes the automatic segmentation difficult and inaccurate. (2) The visualization of the choroid is hindered by the vessel shadows from the superficial layers of the inner retina. In this paper, we propose to incorporate medical and imaging prior knowledge with deep learning to address these two problems. We propose a biomarker-infused global-to-local network (Bio-Net) for the choroid segmentation, which not only regularizes the segmentation via predicted choroid thickness, but also leverages a global-to-local segmentation strategy to provide global structure information and suppress overfitting. For eliminating the retinal vessel shadows, we propose a deep-learning pipeline, which firstly locate the shadows using their projection on the retinal pigment epithelium layer, then the contents of the choroidal vasculature at the shadow locations are predicted with an edge-to-texture generative adversarial inpainting network. The results show our method outperforms the existing methods on both tasks. We further apply the proposed method in a clinical prospective study for understanding the pathology of glaucoma, which demonstrates its capacity in detecting the structure and vascular changes of the choroid related to the elevation of intra-ocular pressure.

AAAI Conference 2020 Conference Paper

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network

  • Fei Li
  • Hong Yu

Automated ICD coding, which assigns the International Classification of Disease codes to patient visits, has attracted much research attention since it can save time and labor for billing. The previous state-of-the-art model utilized one convolutional layer to build document representations for predicting ICD codes. However, the lengths and grammar of text fragments, which are closely related to ICD coding, vary a lot in different documents. Therefore, a flat and fixed-length convolutional architecture may not be capable of learning good document representations. In this paper, we proposed a Multi-Filter Residual Convolutional Neural Network (MultiResCNN) for ICD coding. The innovations of our model are two-folds: it utilizes a multi-filter convolutional layer to capture various text patterns with different lengths and a residual convolutional layer to enlarge the receptive field. We evaluated the effectiveness of our model on the widely-used MIMIC dataset. On the full code set of MIMIC-III, our model outperformed the state-of-the-art model in 4 out of 6 evaluation metrics. On the top-50 code set of MIMIC-III and the full code set of MIMIC-II, our model outperformed all the existing and state-of-the-art models in all evaluation metrics. The code is available at https: //github. com/foxlf823/Multi-Filter- Residual-Convolutional-Neural-Network.

IJCAI Conference 2016 Conference Paper

Joint Models for Extracting Adverse Drug Events from Biomedical Text

  • Fei Li
  • Yue Zhang
  • Meishan Zhang
  • Donghong Ji

Extracting adverse drug events receives much research attention in the biomedical community. Previous work adopts pipeline models, firstly recognizing drug/disease entity mentions and then identifying adverse drug events from drug/disease pairs. In this paper, we investigate joint models for simultaneously extracting drugs, diseases and adverse drug events. Compared with pipeline models, joint models have two main advantages. First, they make use of information integration to facilitate performance improvement; second, they reduce error propagation in pipeline methods. We compare a discrete model and a deep neural model for extracting drugs, diseases and adverse drug events jointly. Experimental results on a standard ADE corpus show that the discrete joint model outperforms a state-of-the-art baseline pipeline significantly. In addition, when discrete features are replaced by neural features, the recall is further improved.

NeurIPS Conference 2011 Conference Paper

Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

  • Jia Deng
  • Sanjeev Satheesh
  • Alexander Berg
  • Fei Li

We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al.

NeurIPS Conference 2011 Conference Paper

Large-Scale Category Structure Aware Image Categorization

  • Bin Zhao
  • Fei Li
  • Eric Xing

Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1. 2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach.