Arrow Research search

Author name cluster

Fei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers
2 author rows

Possible papers

40

AAAI Conference 2026 Conference Paper

Generating Attribute-Aware Human Motions from Textual Prompt

  • Xinghan Wang
  • Kun Xu
  • Fei Li
  • Cao Sheng
  • JiaZhong Yu
  • Yadong Mu

Text-driven human motion generation has recently attracted considerable attention, allowing models to generate human motions based on textual descriptions. However, current methods neglect the influence of human attributes—such as age, gender, weight, and height—which are key factors shaping human motion patterns. This work represents a pilot exploration for bridging this gap. We conceptualize each motion as comprising both attribute information and action semantics, where textual descriptions align exclusively with action semantics. To achieve this, a new framework inspired by Structural Causal Models is proposed to decouple action semantics from human attributes, enabling text-to-semantics prediction and attribute-controlled generation. The resulting model is capable of generating attribute-aware motion aligned with the user's text and attribute inputs. For evaluation, we introduce a comprehensive dataset containing attribute annotations for text-motion pairs, setting the first benchmark for attribute-aware motion generation. Extensive experiments validate our model's effectiveness.

AAAI Conference 2026 Conference Paper

Interest-Shift-Aware Logical Reasoning for Efficient Long-Sequence Recommendation

  • Fei Li
  • Qingyun Gao
  • Enneng Yang
  • Jianzhe Zhao
  • Guibing Guo

Logical reasoning-based recommendation methods formulate logical expressions to characterize user-item interaction patterns, incorporating regularization constraints to ensure consistency with logical rules. However, these methods face two critical challenges: (1) As sequence length increases, they cannot effectively capture the dynamic transfer of user interests across subsequences (i.e., subsequence interest drift), thereby degenerating logical expressions to single-subsequence inference. (2) The time complexity of logical reasoning and rule learning scales quadratically with the sequence length, severely constraining computational efficiency in long-sequence recommendation. To address these challenges, we propose ELECTOR, an intErest-shift-aware long-sequence Logical reasoning for EffiCienT lOng-sequence Recommendation method. Specifically, we design a Subsequence Interest Learning Module (SIL) to model cross-subsequence interest drifts in long sequences. SIL employs a local attention mechanism to extract subsequence interests effectively and a global attention mechanism to capture the correlations among subsequence interests. Subsequently, we propose an Interest-aware Logical Reasoning (ILR) mechanism that performs logical reasoning using a limited set of subsequence and short-term interests, rather than reasoning over the entire sequence, significantly reducing time complexity. Additionally, ILR employs interest logical reasoning contrastive loss to ensure the model simultaneously considers multiple interests. Experiments on four real-world datasets demonstrate that our method significantly outperforms all baselines regarding computational efficiency and recommendation accuracy, confirming its effectiveness.

AAAI Conference 2026 Conference Paper

KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache

  • Fei Li
  • Song Liu
  • Weiguo Wu
  • Shiqiang Nie
  • Jinyu Wang

The high memory demands of the Key-Value (KV) Cache during the inference of Large Language Models (LLMs) severely restrict their deployment in resource-constrained platforms. Quantization can effectively alleviate the memory pressure caused by KV Cache. However, existing methods either rely on static one-size-fits-all precision allocation or fail to dynamically prioritize critical KV in long-context tasks, forcing memory-accuracy-throughput tradeoffs. In this work, we propose a novel mixed-precision quantization method for KV Cache named KVmix. KVmix leverages gradient-based importance analysis to evaluate how individual Key and Value projection matrices affect the model loss, enabling layer-specific bit-width allocation for mix-precision quantization. It dynamically prioritizes higher precision for important layers while aggressively quantizing less influential ones, achieving a tunable balance between accuracy and efficiency. KVmix introduces a dynamic long-context optimization strategy that adaptively keeps full-precision KV pairs for recent pivotal tokens and compresses older ones, achieving high-quality sequence generation with low memory usage. Additionally, KVmix provides efficient low-bit quantization and CUDA kernels to optimize computational overhead. On LLMs such as Llama and Mistral, KVmix achieves near-lossless inference performance with extremely low quantization configuration (Key 2.19bit Value 2.38bit), while delivering a remarkable 4.9× memory compression and a 5.3× speedup in inference throughput.

AAAI Conference 2026 Conference Paper

PaSE: Prototype-aligned Calibration and Shapley-based Equilibrium for Multimodal Sentiment Analysis

  • Kang He
  • Boyu Chen
  • Yuzhe Ding
  • Fei Li
  • Chong Teng
  • Donghong Ji

Multimodal Sentiment Analysis (MSA) seeks to understand human emotions by integrating textual, acoustic, and visual signals. Although multimodal fusion is designed to leverage cross-modal complementarity, real-world scenarios often exhibit modality competition: dominant modalities tend to overshadow weaker ones, leading to suboptimal performance. In this paper, we propose PaSE, a novel Prototype-aligned Calibration and Shapley-optimized Equilibrium framework, which enhances collaboration while explicitly mitigating modality competition. PaSE first applies Prototype-guided Calibration Learning (PCL) to refine unimodal representations and align them through an Entropic Optimal Transport mechanism that ensures semantic consistency. To further stabilize optimization, we introduce a Dual-Phase Optimization strategy. A prototype-gated fusion module is first used to extract shared representations, followed by Shapley-based Gradient Modulation (SGM), which adaptively adjusts gradients according to the contribution of each modality. Extensive experiments on IEMOCAP, MOSI, and MOSEI confirm that PaSE achieves the superior performance and effectively alleviates modality competition.

EAAI Journal 2025 Journal Article

A lightweight deep learning framework for wild berry detection in complex natural environments

  • Xiaorong Zhang
  • Fei Li
  • XuTing Hu
  • Juan Fang

Wild berries (WildB) play a crucial role in Nordic forest ecosystems’ ecological and economic balance. However, research on wild berry detection has remained scarce in recent years. Driven by recent advancements in Artificial Intelligence (AI), particularly in deep learning-based computer vision applications, we propose a detection model based on the You Only Look Once version 11n (YOLOv11n) architecture (WildB-YOLO). WildB-YOLO integrates multiple innovations: the Frog Feature Pyramid Network (FrogFPN) improves multi-scale feature fusion, facilitating the detection of objects at various scales. The Scale-Aware Context Module (SACM) enhances contextual modeling, improving target discrimination. The Weighted Exponential Moving Average Loss (WEMA Loss) mitigates class imbalance. Additionally, Soft Non-Maximum Suppression (Soft-NMS) refines bounding box selection and reduces false positives, enhancing overall detection performance. The model employs Layer Adaptive Magnitude-Based Pruning (LAMP) to enhance efficiency further, achieving lightweight optimization while maintaining high detection precision. Experimental results demonstrate that WildB-YOLO achieves a mean Average Precision (mAP) of 59. 5% at Intersection over Union (IoU) thresholds ranging from 50% to 95% (mAP50-95), outperforming the original YOLOv11n by 1. 9%. Furthermore, WildB-YOLO’s optimized model size is 1. 5MB, with 2. 5G floating point operations per second (FLOPs) and 2. 6M parameters, representing reductions of 71. 1%, 60. 3%, and 76. 9%, respectively, compared to the baseline model. This reduction in complexity facilitates deployment on resource-constrained devices, significantly enhancing the applicability of AI-driven berry detection in practical field scenarios. This study pioneers a dedicated solution for wild berry detection and contributes novel strategies for small object detection in complex natural environments. WildB-YOLO is open-source at: https: //github. com/zxr0826/WildB-YOLO.

EAAI Journal 2025 Journal Article

A two-stage model for unified sentence- and document-level biomedical event extraction

  • Fangfang Su
  • Yue Zhang
  • Pengfei Jiao
  • Zhidong Zhao
  • Bobo Li
  • Fei Li
  • Donghong Ji

Biomedical event extraction, a cornerstone of information extraction, has increasingly attracted attention within the biomedical research community. Moreover, it is a highly complex task, which not only deals with many sub-tasks but also involves nested events. Currently, the research on biomedical event extraction, whether pipelined model or joint method, needs to be processed for each sub-task. The process of processing each sub-task one by one lead to the degradation of event extraction performance. In addition, most studies focus on extracting sentence-level events and ignore cross-sentence event information. To solve these problems, we simplify the process of event extraction, reduce the processing steps, and combine the two sub-tasks of relation extraction and argument combination as one sub-task. In addition, we consider document-level event extraction, which not only extracts cross-sentence events but also considers broader context information. Experimental results indicate that our novel approach outperforms prior studies. Additionally, the document-level event extraction model attains the top performance on the BioNLP’11 test data and achieves near-leading performance on the BioNLP’13 test data.

EAAI Journal 2025 Journal Article

An attention-guided multi-scale feature cascade network for underwater fish counting

  • Hanyu Zhang
  • Mengping Dong
  • Fei Li
  • Zhenbo Li
  • Ping Hu

Visual counting is essential for advancing fisheries intelligence, but fish scale variation in open underwater environments has made underwater fish counting a constant challenge. Therefore, we propose an Attention-guided Multi-scale Feature Cascade Network, named AMFCNet, which resolves scale variation and improves the accuracy of fish counting in complex underwater environments. AMFCNet utilizes a multi-scale attention gate for multi-scale feature fusion, and integrates a multi-scale convolution module to capture complex spatial relationships. It also employs a multi-head supervision fusion strategy to mask irrelevant regions, ensuring targeted learning for each scale and generating high-quality multi-scale density maps. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on the proposed dataset with the lowest computational cost, significantly outperforming 11 mainstream counting methods. It also achieves excellent results on other publicly available underwater datasets, with Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Normalized Absolute Error (NAE) values of 1. 26, 1. 71, and 0. 08, respectively. This method shows significant potential for practical applications in aquaculture, such as in marine ranching and pond farming, to assess fish growth conditions and adjust feeding strategies accordingly.

EAAI Journal 2025 Journal Article

Contrastive prototype learning with semantic patchmix for few-shot image classification

  • Mengping Dong
  • Fei Li
  • Zhenbo Li
  • Xue Liu

Few-shot image classification aims to learn unseen classes with only a few training samples for each class. However, most existing models still suffer from weak feature representation due to data scarcity. To this end, a novel contrastive learning framework is proposed for few-shot image classification that utilizes patch-wise and class-wise features. Concretely, a semantic patchmix scheme is designed to effectively capture patch-wise features with more discriminative representation. Specifically, a new information noise contrastive estimation loss with modulating factor is proposed to adjust the weights of samples, which is adaptive and trades off different samples. For class-wise features, contrastive prototype learning on two correlated views is leveraged to enhance the generalization of representations. Experiments demonstrate that our method achieves competitive performances on five popular datasets for few-shot image classification. In particular, our method brings a 1. 74% improvement in accuracy over state-of-the-art methods on 5-way 1-shot.

NeurIPS Conference 2025 Conference Paper

EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions

  • Xiaorui Wu
  • Fei Li
  • Xiaofeng Mao
  • Xin Zhang
  • Li Zheng
  • Yuxiang Peng
  • Chong Teng
  • Donghong Ji

Large language models (LLMs) frequently refuse to respond to pseudo-malicious instructions: semantically harmless input queries triggering unnecessary LLM refusals due to conservative safety alignment, significantly impairing user experience. Collecting such instructions is crucial for evaluating and mitigating over-refusals, but existing instruction curation methods, like manual creation or instruction rewriting, either lack scalability or fail to produce sufficiently diverse and effective refusal-inducing prompts. To address these limitations, we introduce EVOREFUSE, a prompt optimization approach that generates diverse pseudo-malicious instructions consistently eliciting confident refusals across LLMs. EVOREFUSE employs an evolutionary algorithm exploring the instruction space in more diverse directions than existing methods via mutation strategies and recombination, and iteratively evolves seed instructions to maximize evidence lower bound on LLM refusal probability. Using EVOREFUSE, we create two novel datasets: EVOREFUSE-TEST, a benchmark of 582 pseudo-malicious instructions that outperforms the next-best benchmark with 85. 34% higher average refusal triggering rate across 9 LLMs without a safety-prior system prompt, 34. 86% greater lexical diversity, and 40. 03% improved LLM response confidence scores; and EVOREFUSE-ALIGN, which provides 3, 000 pseudo-malicious instructions with responses for supervised and preference-based alignment training. With supervised fine-tuning on EVOREFUSE-ALIGN, LLAMA3. 1-8B-INSTRUCT achieves up to 29. 85% fewer over-refusals than models trained on the second-best alignment dataset, without compromising safety. Our analysis with EVOREFUSE-TEST reveals models trigger over-refusals by overly focusing on sensitive keywords while ignoring broader context. Our code and datasets are available at https: //github. com/FishT0ucher/EVOREFUSE.

AAAI Conference 2025 Conference Paper

Multi-Granular Multimodal Clue Fusion for Meme Understanding

  • Li Zheng
  • Hao Fei
  • Ting Dai
  • Zuquan Peng
  • Fei Li
  • Huisheng Ma
  • Chong Teng
  • Donghong Ji

With the continuous emergence of various social media platforms frequently used in daily life, the multimodal meme understanding (MMU) task has been garnering increasing attention. MMU aims to explore and comprehend the meanings of memes from various perspectives by performing tasks such as metaphor recognition, sentiment analysis, intention detection, and offensiveness detection. Despite making progress, limitations persist due to the loss of fine-grained metaphorical visual clue and the neglect of multimodal text-image weak correlation. To overcome these limitations, we propose a multi-granular multimodal clue fusion model (MGMCF) to advance MMU. Firstly, we design an object-level semantic mining module to extract object-level image feature clues, achieving fine-grained feature clue extraction and enhancing the model's ability to capture metaphorical details and semantics. Secondly, we propose a brand-new global-local cross-modal interaction model to address the weak correlation between text and images. This model facilitates effective interaction between global multimodal contextual clues and local unimodal feature clues, strengthening their representations through a bidirectional cross-modal attention mechanism. Finally, we devise a dual-semantic guided training strategy to enhance the model's understanding and alignment of multimodal representations in the semantic space. Experiments conducted on the widely-used MET-MEME bilingual dataset demonstrate significant improvements over state-of-the-art baselines. Specifically, there is an 8.14% increase in precision for offensiveness detection task, and respective accuracy enhancements of 3.53%, 3.89%, and 3.52% for metaphor recognition, sentiment analysis, and intention detection tasks. These results, underpinned by in-depth analyses, underscore the effectiveness and potential of our approach for advancing MMU.

EAAI Journal 2025 Journal Article

Towards salient object detection via parallel dual-decoder network

  • Chaojun Cen
  • Fei Li
  • Zhenbo Li
  • Yun Wang

Salient object detection, an important preprocessing step in computer vision, segments the most prominent objects in an image. However, existing research in this field utilizes transformer-based methods to capture global context information, failing to effectively obtain local spatial features. To solve this issue, we propose a parallel dual-decoder network, which consists of a novel semantic decoder and a modified salient decoder. Specifically, the proposed semantic decoder is designed to learn the local spatial details, and the salient decoder utilizes the learnable queries to establish global saliency dependencies among objects. Moreover, the two decoders establish correlations between saliency and multi-scale semantic representations through cross-attention interaction, significantly enhancing the performance of salient object detection. In other words, we obtain global context information in the decoder to prevent discriminative features from being diluted during information propagation. Extensive experiments on 15 benchmark datasets demonstrate that our model significantly outperforms other comparison methods and shows promising potential for real-world applications such as challenging optical remote sensing, underwater, low-light, and other open scenarios. In addition, our method shows excellent performance in other downstream tasks such as camouflaged object detection, transparent object detection, shadow detection, and semantic segmentation.

EAAI Journal 2024 Journal Article

A two-level game theoretic approach for task offloading in mobile edge computing

  • Fei Li
  • Erqian Ge
  • Wanyue Hu
  • Rongsheng Xia

In the mobile edge computing system subject to wireless interference, the Edge Server Provider (ESP) aims to offer profitable computing resources to Device Managers (DMs), who make optimal strategies based on the provided prices. However, the existence of mixed variables typically constitutes an NP-hard problem, posing a significant challenge for optimization. In response to address this issue, we formulate a bi-level optimization problem, where the upper level is devoted to optimizing the pricing of computing resources. The lower level optimizes DM migration strategies and resource allocation at specified prices. Leveraging game theory, we achieve distributed and efficient computation offloading by formulating the distributed computing offloading strategy among lower-level DMs as a task offloading game. Our analysis identifies Nash equilibrium and finite improvement characteristics within the game. Based on these insights, we present a bi-level distributed computation offloading algorithm capable of reaching Nash equilibrium, thus optimizing the profit for both DMs and ESP. The experiments have demonstrated the algorithm’s effectiveness in reducing system costs and maximizing the profits of ESP with DMs across diverse scenarios.

AAAI Conference 2024 Conference Paper

Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach

  • Yuyang Chai
  • Zhuang Li
  • Jiahui Liu
  • Lei Chen
  • Fei Li
  • Donghong Ji
  • Chong Teng

Despite significant advancements in multi-label text classification, the ability of existing models to generalize to novel and seldom-encountered complex concepts, which are compositions of elementary ones, remains underexplored. This research addresses this gap. By creating unique data splits across three benchmarks, we assess the compositional generalization ability of existing multi-label text classification models. Our results show that these models often fail to generalize to compositional concepts encountered infrequently during training, leading to inferior performance on tests with these new combinations. To address this, we introduce a data augmentation method that leverages two innovative text generation models designed to enhance the classification models' capacity for compositional generalization. Our experiments show that this data augmentation approach significantly improves the compositional generalization capabilities of classification models on our benchmarks, with both generation models surpassing other text generation baselines. Our codes available at https://github.com/yychai74/LD-VAE.

AAAI Conference 2024 Conference Paper

Harnessing Holistic Discourse Features and Triadic Interaction for Sentiment Quadruple Extraction in Dialogues

  • Bobo Li
  • Hao Fei
  • Lizi Liao
  • Yu Zhao
  • Fangfang Su
  • Fei Li
  • Donghong Ji

Dialogue Aspect-based Sentiment Quadruple (DiaASQ) is a newly-emergent task aiming to extract the sentiment quadruple (i.e., targets, aspects, opinions, and sentiments) from conversations. While showing promising performance, the prior DiaASQ approach unfortunately falls prey to the key crux of DiaASQ, including insufficient modeling of discourse features, and lacking quadruple extraction, which hinders further task improvement. To this end, we introduce a novel framework that not only capitalizes on comprehensive discourse feature modeling, but also captures the intrinsic interaction for optimal quadruple extraction. On the one hand, drawing upon multiple discourse features, our approach constructs a token-level heterogeneous graph and enhances token interactions through a heterogeneous attention network. We further propose a novel triadic scorer, strengthening weak token relations within a quadruple, thereby enhancing the cohesion of the quadruple extraction. Experimental results on the DiaASQ benchmark showcase that our model significantly outperforms existing baselines across both English and Chinese datasets. Our code is available at https://bit.ly/3v27pqA.

JBHI Journal 2024 Journal Article

Improving Tumor Classification by Reusing Self-Predicted Segmentation of Medical Images as Guiding Knowledge

  • Xiaoyi Lin
  • Mingyu Wang
  • Fei Li
  • Ziyue Xu
  • Jia Chen
  • Xin Chen
  • Chenglang Yuan
  • Songxiong Wu

Differential diagnosis of tumors is important for computer-aided diagnosis. In computer-aided diagnosis systems, expert knowledge of lesion segmentation masks is limited as it is only used during preprocessing or as supervision to guide feature extraction. To improve the utilization of lesion segmentation masks, this study proposes a simple and effective multitask learning network that improves medical image classification using self-predicted segmentation as guiding knowledge; we call this network RS $^{2}$ -net. In RS $^{2}$ -net, the predicted segmentation probability map obtained from the initial segmentation inference is added to the original image to form a new input, which is then reinput to the network for the final classification inference. We validated the proposed RS $^{2}$ -net using three datasets: the pNENs-Grade dataset, which tested the prediction of pancreatic neuroendocrine neoplasm grading, and the HCC-MVI dataset, which tested the prediction of microvascular invasion of hepatocellular carcinoma, and ISIC 2017 public skin lesion dataset. The experimental results indicate that the proposed strategy of reusing self-predicted segmentation is effective, and RS $^{2}$ -net outperforms other popular networks and existing state-of-the-art studies. Interpretive analytics based on feature visualization demonstrates that the improved classification performance of our reuse strategy is due to the semantic information that can be acquired in advance in a shallow network.

AAAI Conference 2024 Conference Paper

MindMap: Constructing Evidence Chains for Multi-Step Reasoning in Large Language Models

  • Yangyu Wu
  • Xu Han
  • Wei Song
  • Miaomiao Cheng
  • Fei Li

Large language models (LLMs) have demonstrated remarkable performance in various natural language processing tasks. However, they still face significant challenges in automated reasoning, particularly in scenarios involving multi-step reasoning. In this paper, we focus on the logical reasoning problem. The main task is to answer a question based on a set of available facts and rules. A lot of work has focused on guiding LLMs to think logically by generating reasoning paths, ignoring the structure among available facts. In this paper, we propose a simple approach MindMap by introducing evidence chains for supporting reasoning. An evidence chain refers to a set of facts that involve the same subject. In this way, we can organize related facts together to avoid missing important information. MindMap can be integrated with existing reasoning framework, such as Chain-of-Thought (CoT) and Selection-Inference (SI), by letting the model select relevant evidence chains instead of independent facts. The experimental results on the bAbI and ProofWriterOWA datasets demonstrate the effectiveness of MindMap.It can significantly improve CoT and SI, especially in multi-step reasoning tasks.

AAAI Conference 2024 Conference Paper

Reverse Multi-Choice Dialogue Commonsense Inference with Graph-of-Thought

  • Li Zheng
  • Hao Fei
  • Fei Li
  • Bobo Li
  • Lizi Liao
  • Donghong Ji
  • Chong Teng

With the proliferation of dialogic data across the Internet, the Dialogue Commonsense Multi-choice Question Answering (DC-MCQ) task has emerged as a response to the challenge of comprehending user queries and intentions. Although prevailing methodologies exhibit effectiveness in addressing single-choice questions, they encounter difficulties in handling multi-choice queries due to the heightened intricacy and informational density. In this paper, inspired by the human cognitive process of progressively excluding options, we propose a three-step Reverse Exclusion Graph-of-Thought (ReX-GoT) framework, including Option Exclusion, Error Analysis, and Combine Information. Specifically, our ReX-GoT mimics human reasoning by gradually excluding irrelevant options and learning the reasons for option errors to choose the optimal path of the GoT and ultimately infer the correct answer. By progressively integrating intricate clues, our method effectively reduces the difficulty of multi-choice reasoning and provides a novel solution for DC-MCQ. Extensive experiments on the CICERO and CICERO_v2 datasets validate the significant improvement of our approach on DC-MCQ task. On zero-shot setting, our model outperform the best baseline by 17.67% in terms of F1 score for the multi-choice task. Most strikingly, our GPT3.5-based ReX-GoT framework achieves a remarkable 39.44% increase in F1 score.

YNIMG Journal 2024 Journal Article

Structural and functional alterations in MRI-negative drug-resistant epilepsy and associated gene expression features

  • Ting Liu
  • Sheng Wang
  • Yingjie Tang
  • Sisi Jiang
  • Huixia Lin
  • Fei Li
  • Dezhong Yao
  • Xian Zhu

Neuroimaging techniques have been widely used in the study of epilepsy. However, structural and functional changes in the MRI-negative drug-resistant epilepsy (DRE) and the genetic mechanisms behind the structural alterations remain poorly understood. Using structural and functional MRI, we analyzed gray matter volume (GMV) and regional homogeneity (ReHo) in DRE, drug-sensitive epilepsy (DSE) and healthy controls. Gene expression data from Allen human brain atlas and GMV/ReHo were evaluated to obtain drug resistance-related and epilepsy-associated gene expression and compared with real transcriptional data in blood. We found structural and functional alterations in the cerebellum of DRE patients, which may be related to the mechanisms of drug resistance in DRE. Our study confirms that changes in brain morphology and regional activity in DRE patients may be associated with abnormal gene expression related to nervous system development. And SP1, as an important transcription factor, plays an important role in the mechanism of drug resistance.

AAAI Conference 2023 Conference Paper

Dialogue State Distillation Network with Inter-slot Contrastive Learning for Dialogue State Tracking

  • Jing Xu
  • Dandan Song
  • Chong Liu
  • Siu Cheung Hui
  • Fei Li
  • Qiang Ju
  • Xiaonan He
  • Jian Xie

In task-oriented dialogue systems, Dialogue State Tracking (DST) aims to extract users' intentions from the dialogue history. Currently, most existing approaches suffer from error propagation and are unable to dynamically select relevant information when utilizing previous dialogue states. Moreover, the relations between the updates of different slots provide vital clues for DST. However, the existing approaches rely only on predefined graphs to indirectly capture the relations. In this paper, we propose a Dialogue State Distillation Network (DSDN) to utilize relevant information of previous dialogue states and migrate the gap of utilization between training and testing. Thus, it can dynamically exploit previous dialogue states and avoid introducing error propagation simultaneously. Further, we propose an inter-slot contrastive learning loss to effectively capture the slot co-update relations from dialogue context. Experiments are conducted on the widely used MultiWOZ 2.0 and MultiWOZ 2.1 datasets. The experimental results show that our proposed model achieves the state-of-the-art performance for DST.

EAAI Journal 2023 Journal Article

MOIT: A Novel task for mining opinions towards implicit targets

  • Jun Zhou
  • Fei Li
  • Chong Teng
  • Yijiang Liu
  • Chunli Xiang
  • Donghong Ji

The extraction of opinions and their corresponding targets has gained significant interest recently, as it offers valuable insights into Opinion Mining (OM) at a granular level. Opinion and target terms to be extracted by existing OM tasks need to be explicitly present in reviews. Targets that are not present but implied in contextual semantics, are neglected by existing OM tasks, even though an investigation reported that about 60% of reviews contain implicit targets. To enable implicit target extraction, a novel task named Mining Opinions towards Implicit Targets (MOIT) under the fine-grained OM, is proposed to extract both opinions and their corresponding implicit targets, enabling a more comprehensive analysis of reviews. To set up the basis for follow-up research on MOIT, two large-scale datasets were constructed as resources in two languages, where the Chinese dataset was built from scratch via a standard human annotation process, and the English dataset was built semi-automatically through machine translation and manual checking. Furthermore, three baseline models adapting three representative paradigms of information extraction, namely sequence labeling, question answering, and text generation, were proposed to solve MOIT. Extensive experiments demonstrated the effectiveness of the models. The proposed MOIT task extends the field of OM research, and the datasets and models establish a foundation for future studies in this area.

IJCAI Conference 2022 Conference Paper

Global Inference with Explicit Syntactic and Discourse Structures for Dialogue-Level Relation Extraction

  • Hao Fei
  • Jingye Li
  • Shengqiong Wu
  • Chenliang Li
  • Donghong Ji
  • Fei Li

Recent research attention for relation extraction has been paid to the dialogue scenario, i. e. , dialogue-level relation extraction (DiaRE). Existing DiaRE methods either simply concatenate the utterances in a dialogue into a long piece of text, or employ naive words, sentences or entities to build dialogue graphs, while the structural characteristics in dialogues have not been fully utilized. In this work, we investigate a novel dialogue-level mixed dependency graph (D2G) and an argument reasoning graph (ARG) for DiaRE with a global relation reasoning mechanism. First, we model the entire dialogue into a unified and coherent D2G by explicitly integrating both syntactic and discourse structures, which enables richer semantic and feature learning for relation extraction. Second, we stack an ARG graph on top of D2G to further focus on argument inter-dependency learning and argument representation refinement, for sufficient argument relation inference. In our global reasoning framework, D2G and ARG work collaboratively, iteratively performing lexical, syntactic and semantic information exchange and representation learning over the entire dialogue context. On two DiaRE benchmarks, our framework shows considerable improvements over the current state-of-the-art baselines. Further analyses show that the model effectively solves the long-range dependence issue, and meanwhile gives explainable predictions.

IJCAI Conference 2022 Conference Paper

Inheriting the Wisdom of Predecessors: A Multiplex Cascade Framework for Unified Aspect-based Sentiment Analysis

  • Hao Fei
  • Fei Li
  • Chenliang Li
  • Shengqiong Wu
  • Jingye Li
  • Donghong Ji

So far, aspect-based sentiment analysis (ABSA) has involved with total seven subtasks, in which, however the interactions among them have been left unexplored sufficiently. This work presents a novel multiplex cascade framework for unified ABSA and maintaining such interactions. First, we model total seven subtasks as a hierarchical dependency in the easy-to-hard order, based on which we then propose a multiplex decoding mechanism, transferring the sentiment layouts and clues in lower tasks to upper ones. The multiplex strategy enables highly-efficient subtask interflows and avoids repetitive training; meanwhile it sufficiently utilizes the existing data without requiring any further annotation. Further, based on the characteristics of aspect-opinion term extraction and pairing, we enhance our multiplex framework by integrating POS tag and syntactic dependency information for term boundary and pairing identification. The proposed Syntax-aware Multiplex (SyMux) framework enhances the ABSA performances on 28 subtasks (7×4 datasets) with big margins.

NeurIPS Conference 2022 Conference Paper

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

  • Hao Fei
  • Shengqiong Wu
  • Jingye Li
  • Bobo Li
  • Fei Li
  • Libo Qin
  • Meishan Zhang
  • Min Zhang

Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE. A heterogeneous structure inductor is explored to unsupervisedly induce rich heterogeneous structural representations by post-training an existing GLM. In particular, a structural broadcaster is devised to compact various latent trees into explicit high-order forests, helping to guide a better generation during decoding. We finally introduce a task-oriented structure fine-tuning mechanism, further adjusting the learned structures to most coincide with the end-task's need. Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying.

AAAI Conference 2022 Conference Paper

Mastering the Explicit Opinion-Role Interaction: Syntax-Aided Neural Transition System for Unified Opinion Role Labeling

  • Shengqiong Wu
  • Hao Fei
  • Fei Li
  • Meishan Zhang
  • Yijiang Liu
  • Chong Teng
  • Donghong Ji

Unified opinion role labeling (ORL) aims to detect all possible opinion structures of ‘opinion-holder-target’ in one shot, given a text. The existing transition-based unified method, unfortunately, is subject to longer opinion terms and fails to solve the term overlap issue. Current top performance has been achieved by employing the span-based graph model, which however still suffers from both high model complexity and insufficient interaction among opinions and roles. In this work, we investigate a novel solution by revisiting the transition architecture, and augmenting it with a pointer network (PointNet). The framework parses out all opinion structures in linear-time complexity, meanwhile breaks through the limitation of any length of terms with PointNet. To achieve the explicit opinion-role interactions, we further propose a unified dependency-opinion graph (UDOG), co-modeling the syntactic dependency structure and the partial opinion-role structure. We then devise a relation-centered graph aggregator (RCGA) to encode the multi-relational UDOG, where the resulting high-order representations are used to promote the predictions in the vanilla transition system. Our model achieves new state-of-the-art results on the MPQA benchmark. Analyses further demonstrate the superiority of our methods on both efficacy and efficiency.

EAAI Journal 2022 Journal Article

Towards fusing fuzzy discriminative projection and representation learning for image classification

  • Yun Wang
  • Zhenbo Li
  • Fei Li
  • Pu Yang
  • Jun Yue

The fuzzy and indistinguishable data affected by complex and variable factors lead to the inferior recognition performance, which is hard to avoid in data acquisition. Subspace projection is widely used in extracting low-dimensional important features for image processing task. However, many existing methods rarely explore on the fuzziness and uncertainty of visual data, while lack sufficient mining of prior knowledge. In this work, we propose a novel fuzzy discriminative projection and representation learning (FDPR) method for image classification. Specifically, the fuzzy weight matrix with label information is designed in the data reconstruction to generate more specific sparse constraint on representation coefficients. In addition, low-rank and l 2, 1 norm constraints are introduced to enhance the robustness of the algorithm. Finally, we combine a classification regression term with the representation coefficients carrying discriminative information for the subspace projection learning, thus fully utilizing data label information and eventually benefiting the subspace to be more distinguishable. The experimental results on several datasets show that our proposed model performs well with effectiveness and robustness surpassing other state-of-the-art approaches.

AAAI Conference 2022 Conference Paper

Unified Named Entity Recognition as Word-Word Relation Classification

  • Jingye Li
  • Hao Fei
  • Jiang Liu
  • Shengqiong Wu
  • Meishan Zhang
  • Chong Teng
  • Donghong Ji
  • Fei Li

So far, named entity recognition (NER) has been involved with three major types, including flat, overlapped (aka. nested), and discontinuous NER, which have mostly been studied individually. Recently, a growing interest has been built for unified NER, tackling the above three jobs concurrently with one single model. Current best-performing methods mainly include span-based and sequence-to-sequence models, where unfortunately the former merely focus on boundary identification and the latter may suffer from exposure bias. In this work, we present a novel alternative by modeling the unified NER as word-word relation classification, namely W2 NER. The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words with Next-Neighboring-Word (NNW) and Tail-Head-Word-* (THW-*) relations. Based on the W2 NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs. We then propose multi-granularity 2D convolutions for better refining the grid representations. Finally, a co-predictor is used to sufficiently reason the word-word relations. We perform extensive experiments on 14 widely-used benchmark datasets for flat, overlapped, and discontinuous NER (8 English and 6 Chinese datasets), where our model beats all the current top-performing baselines, pushing the state-of-the-art performances of unified NER.

AAAI Conference 2021 Conference Paper

Encoder-Decoder Based Unified Semantic Role Labeling with Label-Aware Syntax

  • Hao Fei
  • Fei Li
  • Bobo Li
  • Donghong Ji

Currently the unified semantic role labeling (SRL) that achieves predicate identification and argument role labeling in an end-to-end manner has received growing interests. Recent works show that leveraging the syntax knowledge significantly enhances the SRL performances. In this paper, we investigate a novel unified SRL framework based on the sequence-to-sequence architecture with double enhancement in both the encoder and decoder sides. In the encoder side, we propose a novel label-aware graph convolutional network (LA-GCN) to encode both the syntactic dependent arcs and labels into BERT-based word representations. In the decoder side, we creatively design a pointer-network-based model for detecting predicates, arguments and roles jointly. Our pointernet decoder is able to make decisions by consulting all the input elements in a global view, and meanwhile it is syntacticaware by incorporating the syntax information from LA- GCN. Besides, a high-order interacted attention is introduced into the decoder for leveraging previously recognized triplets to help the current decision. Empirical experiments show that our framework significantly outperforms all existing graphbased methods on the CoNLL09 and Universal Proposition Bank datasets. In-depth analysis demonstrates that our model can effectively capture the correlations between syntactic and SRL structures.

ICRA Conference 2021 Conference Paper

Open-set Intersection Intention Prediction for Autonomous Driving

  • Fei Li
  • Xiangxu Li
  • Jun Luo 0009
  • Shiwei Fan
  • Hongbo Zhang

Intention prediction is a crucial task for Autonomous Driving (AD). Due to the variety of size and layout of intersections, it is challenging to predict intention of human driver at different intersections, especially unseen and irregular intersections. In this paper, we formulate the prediction of intention at intersections as an open-set prediction problem that requires context specific matching of the target vehicle state and the diverse intersection configurations that are in principle unbounded. We capture map-centric features that correspond to intersection structures under a spatial-temporal graph representation, and use two MAAMs (mutually auxiliary attention module) that cover respectively lane-level and exit-level intentions to predict a target that best matches intersection elements in map-centric feature space. Under our model, attention scores estimate the probability distribution of the open-set intentions that are contextually defined by the structure of the current intersection. The proposed model is trained and evaluated on simulated dataset. Furthermore, the model, trained on simulated dataset and without any fine tuning, is directly validated on in-house real-world dataset collected at 98 real-world intersections and exhibits satisfactory performance, demonstrating the practical viability of our approach.

AAAI Conference 2021 Conference Paper

Rethinking Boundaries: End-To-End Recognition of Discontinuous Mentions with Pointer Networks

  • Hao Fei
  • Donghong Ji
  • Bobo Li
  • Yijiang Liu
  • Yafeng Ren
  • Fei Li

A majority of research interests in irregular (e. g. , nested or discontinuous) named entity recognition (NER) have been paid on nested entities, while discontinuous entities received limited attention. Existing work for discontinuous NER, however, either suffers from decoding ambiguity or predicting using token-level local features. In this work, we present an innovative model for discontinuous NER based on pointer networks, where the pointer simultaneously decides whether a token at each decoding frame constitutes an entity mention and where the next constituent token is. Our model has three major merits compared with previous work: (1) The pointer mechanism is memory-augmented, which enhances the mention boundary detection and interactions between the current decision and prior recognized mentions. (2) The encoderdecoder architecture can linearize the complexity of structure prediction, and thus reduce search costs. (3) The model makes every decision using global information, i. e. , by consulting all the input, encoder and previous decoder output in a global view. Experimental results on the CADEC and ShARe13 datasets show that our model outperforms flat and hypergraph models as well as a state-of-the-art transitionbased model for discontinuous NER. Further in-depth analysis demonstrates that our model performs well in recognizing various entities including flat, overlapping and discontinuous ones. More crucially, our model is effective on boundary detection, which is the kernel source to NER.

JBHI Journal 2020 Journal Article

Automatic Segmentation and Visualization of Choroid in OCT with Knowledge Infused Deep Learning

  • Huihong Zhang
  • Jianlong Yang
  • Kang Zhou
  • Fei Li
  • Yan Hu
  • Yitian Zhao
  • Ce Zheng
  • Xiulan Zhang

The choroid provides oxygen and nourishment to the outer retina thus is related to the pathology of various ocular diseases. Optical coherence tomography (OCT) is advantageous in visualizing and quantifying the choroid in vivo. However, its application in the study of the choroid is still limited for two reasons. (1) The lower boundary of the choroid (choroid-sclera interface) in OCT is fuzzy, which makes the automatic segmentation difficult and inaccurate. (2) The visualization of the choroid is hindered by the vessel shadows from the superficial layers of the inner retina. In this paper, we propose to incorporate medical and imaging prior knowledge with deep learning to address these two problems. We propose a biomarker-infused global-to-local network (Bio-Net) for the choroid segmentation, which not only regularizes the segmentation via predicted choroid thickness, but also leverages a global-to-local segmentation strategy to provide global structure information and suppress overfitting. For eliminating the retinal vessel shadows, we propose a deep-learning pipeline, which firstly locate the shadows using their projection on the retinal pigment epithelium layer, then the contents of the choroidal vasculature at the shadow locations are predicted with an edge-to-texture generative adversarial inpainting network. The results show our method outperforms the existing methods on both tasks. We further apply the proposed method in a clinical prospective study for understanding the pathology of glaucoma, which demonstrates its capacity in detecting the structure and vascular changes of the choroid related to the elevation of intra-ocular pressure.

AAAI Conference 2020 Conference Paper

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network

  • Fei Li
  • Hong Yu

Automated ICD coding, which assigns the International Classification of Disease codes to patient visits, has attracted much research attention since it can save time and labor for billing. The previous state-of-the-art model utilized one convolutional layer to build document representations for predicting ICD codes. However, the lengths and grammar of text fragments, which are closely related to ICD coding, vary a lot in different documents. Therefore, a flat and fixed-length convolutional architecture may not be capable of learning good document representations. In this paper, we proposed a Multi-Filter Residual Convolutional Neural Network (MultiResCNN) for ICD coding. The innovations of our model are two-folds: it utilizes a multi-filter convolutional layer to capture various text patterns with different lengths and a residual convolutional layer to enlarge the receptive field. We evaluated the effectiveness of our model on the widely-used MIMIC dataset. On the full code set of MIMIC-III, our model outperformed the state-of-the-art model in 4 out of 6 evaluation metrics. On the top-50 code set of MIMIC-III and the full code set of MIMIC-II, our model outperformed all the existing and state-of-the-art models in all evaluation metrics. The code is available at https: //github. com/foxlf823/Multi-Filter- Residual-Convolutional-Neural-Network.

YNICL Journal 2019 Journal Article

Adolescent binge drinking disrupts normal trajectories of brain functional organization and personality maturation

  • Hongtao Ruan
  • Yunyi Zhou
  • Qiang Luo
  • Gabriel H. Robert
  • Sylvane Desrivières
  • Erin Burke Quinlan
  • ZhaoWen Liu
  • Tobias Banaschewski

Adolescent binge drinking has been associated with higher risks for the development of many health problems throughout the lifespan. Adolescents undergo multiple changes that involve the co-development processes of brain, personality and behavior; therefore, certain behavior, such as alcohol consumption, can have disruptive effects on both brain development and personality maturation. However, these effects remain unclear due to the scarcity of longitudinal studies. In the current study, we used multivariate approaches to explore discriminative features in brain functional architecture, personality traits, and genetic variants in 19-year-old individuals (n = 212). Taking advantage of a longitudinal design, we selected features that were more drastically altered in drinkers with an earlier onset of binge drinking. With the selected features, we trained a hierarchical model of support vector machines using a training sample (n = 139). Using an independent sample (n = 73), we tested the model and achieved a classification accuracy of 71.2%. We demonstrated longitudinally that after the onset of binge drinking the developmental trajectory of improvement in impulsivity slowed down. This study identified the disrupting effects of adolescent binge drinking on the developmental trajectories of both brain and personality.

YNICL Journal 2019 Journal Article

Disturbed neurovascular coupling in type 2 diabetes mellitus patients: Evidence from a comprehensive fMRI analysis

  • Bo Hu
  • Lin-Feng Yan
  • Qian Sun
  • Ying Yu
  • Jin Zhang
  • Yu-Jie Dai
  • Yang Yang
  • Yu-Chuan Hu

BACKGROUND: Previous studies presumed that the disturbed neurovascular coupling to be a critical risk factor of cognitive impairments in type 2 diabetes mellitus (T2DM), but distinct clinical manifestations were lacked. Consequently, we decided to investigate the neurovascular coupling in T2DM patients by exploring the MRI relationship between neuronal activity and the corresponding cerebral blood perfusion. METHODS: Degree centrality (DC) map and amplitude of low-frequency fluctuation (ALFF) map were used to represent neuronal activity. Cerebral blood flow (CBF) map was used to represent cerebral blood perfusion. Correlation coefficients were calculated to reflect the relationship between neuronal activity and cerebral blood perfusion. RESULTS: At the whole gray matter level, the manifestation of neurovascular coupling was investigated by using 4 neurovascular biomarkers. We compared these biomarkers and found no significant changes. However, at the brain region level, neurovascular biomarkers in T2DM patients were significantly decreased in 10 brain regions. ALFF-CBF in left hippocampus and fractional ALFF-CBF in left amygdala were positively associated with the executive function, while ALFF-CBF in right fusiform gyrus was negatively related to the executive function. The disease severity was negatively related to the memory and executive function. The longer duration of T2DM was related to the milder depression, which suggests T2DM-related depression may not be a physiological condition but be a psychological condition. CONCLUSION: Correlations between neuronal activity and cerebral perfusion maps may be a method for detecting neurovascular coupling abnormalities, which could be used for diagnosis in the future. Trial registry number: This study has been registered in ClinicalTrials.gov (NCT02420470) on April 2, 2015 and published on July 29, 2015.

TCS Journal 2019 Journal Article

Online packet scheduling with bounded delay and lookahead

  • Martin Böhm
  • Marek Chrobak
  • Łukasz Jeż
  • Fei Li
  • Jiří Sgall
  • Pavel Veselý

We study the online bounded-delay packet scheduling problem (PacketScheduling), where packets of unit size arrive at a router over time and need to be transmitted over a network link. Each packet has two attributes: a non-negative weight and a deadline for its transmission. The objective is to maximize the total weight of the transmitted packets. This problem has been well studied in the literature; yet currently the best published upper bound is 1. 828 [8], still quite far from the best lower bound of ϕ ≈ 1. 618 [11, 2, 6]. In the variant of PacketScheduling with s-bounded instances, each packet can be scheduled in at most s consecutive slots, starting at its release time. The lower bound of ϕ applies even to the special case of 2-bounded instances, and a ϕ-competitive algorithm for 3-bounded instances was given in [5]. Improving that result, and addressing a question posed by Goldwasser [9], we present a ϕ-competitive algorithm for 4-bounded instances. We also study a variant of PacketScheduling where an online algorithm has the additional power of 1-lookahead, knowing at time t which packets will arrive at time t + 1. For PacketScheduling with 1-lookahead restricted to 2-bounded instances, we present an online algorithm with competitive ratio 1 2 ( 13 − 1 ) ≈ 1. 303 and we prove a nearly tight lower bound of 1 4 ( 1 + 17 ) ≈ 1. 281. In fact, our lower bound result is more general: using only 2-bounded instances, for any integer ℓ ≥ 0 we prove a lower bound of 1 2 ( ℓ + 1 ) ( 1 + 5 + 8 ℓ + 4 ℓ 2 ) for online algorithms with ℓ-lookahead, i. e. , algorithms that at time t can see all packets arriving by time t + ℓ. Finally, for non-restricted instances we show a lower bound of 1. 25 for randomized algorithms with ℓ-lookahead, for any ℓ ≥ 0.

IJCAI Conference 2016 Conference Paper

Joint Models for Extracting Adverse Drug Events from Biomedical Text

  • Fei Li
  • Yue Zhang
  • Meishan Zhang
  • Donghong Ji

Extracting adverse drug events receives much research attention in the biomedical community. Previous work adopts pipeline models, firstly recognizing drug/disease entity mentions and then identifying adverse drug events from drug/disease pairs. In this paper, we investigate joint models for simultaneously extracting drugs, diseases and adverse drug events. Compared with pipeline models, joint models have two main advantages. First, they make use of information integration to facilitate performance improvement; second, they reduce error propagation in pipeline methods. We compare a discrete model and a deep neural model for extracting drugs, diseases and adverse drug events jointly. Experimental results on a standard ADE corpus show that the discrete joint model outperforms a state-of-the-art baseline pipeline significantly. In addition, when discrete features are replaced by neural features, the recall is further improved.

TCS Journal 2013 Journal Article

A comprehensive study of an online packet scheduling algorithm

  • Fei Li

We study the bounded-delay model for Qualify-of-Service buffer management. Time is discrete. There is a buffer. Unit-length jobs (also called packets) arrive at the buffer over time. Each packet has an integer release time, an integer deadline, and a positive real value. A packet’s characteristics are not known to an online algorithm until the packet actually arrives. In each time step, at most one packet can be sent out of the buffer. The objective is to maximize the total value of the packets sent by their respective deadlines in an online manner. An online algorithm’s performance is usually measured in terms of competitive ratio, when this online algorithm is compared with a clairvoyant algorithm achieving the maximum total value. In this paper, we study a simple and intuitive online algorithm. We analyze its performance in terms of competitive ratio for the general model and a few important variants.

TCS Journal 2013 Journal Article

A near-optimal memoryless online algorithm for FIFO buffering two packet classes

  • Fei Li

We consider scheduling packets with values in a capacity-bounded buffer in an online setting. In this model, there is a buffer with limited capacity B. At any time, the buffer cannot accommodate more than B packets. Packets arrive over time. Each packet has a non-negative value. Packets leave the buffer only because they are either sent or dropped. Those packets that have left the buffer will not be reconsidered for delivery any more. In each time step, at most one packet in the buffer can be sent. The order in which the packets are sent should comply with the order of their arrival time. The objective is to maximize the total value of the packets sent in an online manner. In this paper, we study a variant of this FIFO buffering model in which a packet’s value is either 1 or α > 1. We present a deterministic memoryless 1. 304-competitive algorithm. This algorithm has the same competitive ratio as the one presented in Lotker and Patt-Shamir [Z. Lotker, B. Patt-Shamir, Nearly optimal FIFO buffer management for DiffServ, in: Proceedings of the 21st Annual ACM Symposium on Principles of Distributed Computing, PODC, 2002, pp. 134–142; Z. Lotker, B. Patt-Shamir, Nearly optimal FIFO buffer management for DiffServ, Computer Networks 17 (1) (2003) 77–89]. However, our algorithm is simpler and does not employ any marking bits. The idea used in our algorithm is novel and different from all previous approaches that have been applied for the general model and its variants. We do not proactively preempt one packet when a new packet arrives. Instead, we may preempt more than one 1-value packet at the time when the buffer contains sufficiently many α -value packets.

NeurIPS Conference 2011 Conference Paper

Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

  • Jia Deng
  • Sanjeev Satheesh
  • Alexander Berg
  • Fei Li

We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al.

NeurIPS Conference 2011 Conference Paper

Large-Scale Category Structure Aware Image Categorization

  • Bin Zhao
  • Fei Li
  • Eric Xing

Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1. 2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach.

YNIMG Journal 2010 Journal Article

Localization of cerebral functional deficits in treatment-naive, first-episode schizophrenia using resting-state fMRI

  • Xiao-Qi Huang
  • Su Lui
  • Wei Deng
  • Raymond C.K. Chan
  • Qi-Zhu Wu
  • Li-Jun Jiang
  • Jun-Ran Zhang
  • Zhi-Yun Jia

Background Spontaneous low-frequency fluctuations (LFF) in the blood oxygen level-dependent (BOLD) functional magnetic resonance imaging (fMRI) signal have been shown to reflect cerebral spontaneous neural activity, and the present study attempts to explore the functional changes in the regional brain in patients with schizophrenia using the amplitude of the BOLD signals. Methods A total of 66 treatment-naïve, first-episode schizophrenia (FES) patients and 66 normal age- and sex-matched controls were recruited. Resting-state fMRIs were obtained using a gradient-echo echo-planar imaging sequence. The amplitude of LFF (ALFF) was calculated using REST software. Voxel-based analysis of the ALFF maps between control and patient groups was performed with twos-sample t-tests using SPM2. Results Compared to the controls, the FES group showed significantly decreased ALFF in the medial prefrontal lobe (MPFC) and significant increases in the ALFF in the left and right putamen. Significant positive correlations were observed between ALFF values in the bilateral putamen in both the patient and control groups. Conclusions Alterations of the ALFF in the MPFC and putamen in FES observed in the present study suggest that the functional abnormalities of those areas are at an early stage of the disease.