Arrow Research search

Author name cluster

Feng Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation

  • Rui Ke
  • Jiahui Xu
  • Shenghao Yang
  • Kuang Wang
  • Feng Jiang
  • Haizhou Li

Theme detection is a fundamental task in user-centric dialogue systems, aiming to identify the latent topic of each utterance without relying on predefined schemas. Unlike intent induction, which operates within fixed label spaces, theme detection requires cross-dialogue consistency and alignment with personalized user preferences, posing significant challenges. Existing methods often struggle with sparse, short utterances for accurate topic representation and fail to capture user-level thematic preferences across dialogues. To address these challenges, we propose CATCH (Controllable Theme Detection with Contextualized Clustering and Hierarchical Generation), a unified framework that integrates three core components: (1) context-aware topic representation, which enriches utterance-level semantics using surrounding topic segments; (2) preference-guided topic clustering, which jointly models semantic proximity and personalized feedback to align themes across dialogue; and (3) a hierarchical theme generation mechanism designed to suppress noise and produce robust, coherent topic labels. Experiments on a multi-domain customer dialogue benchmark (DSTC-12) demonstrate the effectiveness of CATCH with 8B LLM in both theme clustering and topic generation quality.

AAAI Conference 2026 Conference Paper

Learning from Guidelines: Structured Prompt Optimization for Expert Annotation Tasks

  • Wenliang Zhong
  • Haiqing Li
  • Thao M. Dang
  • Feng Jiang
  • Hehuan Ma
  • Yuzhi Guo
  • Jean Gao
  • Junzhou Huang

Deep learning has significantly advanced numerous fields by training on extensive annotated datasets. However, this data-driven paradigm faces limitations such as limited adaptability and high annotation costs, particularly when precise adherence to detailed, domain-specific guidelines is required in annotation. This challenge raises a critical question: Can models effectively shift from data-driven learning to autonomously leveraging guidelines with minimal annotated examples? To address this, we propose the Guideline-Driven Prompt (GDP) optimization framework, which shifts the learning paradigm from data-driven training to guideline-driven reasoning. GDP leverages Retrieval Augmented Generation (RAG) to retrieve essential fragments from complex guidelines and synthesize them into structured, executable prompts. A tree-based optimization algorithm systematically constructs and refines these prompts, explicitly capturing the intricate logic embedded in professional guidelines through a latent pipeline structure. Empirical evaluations on four datasets ranging from diverse domains and different tasks demonstrate that GDP effectively transitions the learning process from data-intensive methods to a guideline-driven approach in tasks requiring detailed and complex guideline adherence, reducing dependence on extensive annotated datasets.

AAAI Conference 2026 System Paper

PHOTONS: Pose-Free Human-Centric Photo-Realistic Real-Time Novel View Synthesis from Sparse Views

  • Yongyang Cheng
  • Boqin Qin
  • Zhao Hui
  • Xu Chen
  • Tao Zhang
  • Shang Sun
  • Haiquan Kang
  • Xiaojie Xu

We present PHOTONS (Pose-Free Human-Centric Photo-Realistic Real-Time Novel View Synthesis from Sparse Views), a real-time framework for novel view synthesis without requiring camera calibration. Our method reconstructs consistent 3D Gaussian point clouds and synthesizes 2K photo-realistic novel views from arbitrary numbers (>=2) of freely placed cameras. PHOTONS faithfully renders dynamic human bodies amid complex backgrounds, including interactive object manipulation and fine-grained details (e.g., hair strands), while maintaining 25 FPS throughput on commodity GPU like NVIDIA RTX 4090. By combining pose-free spatial point cloud reconstruction with Gaussian parameter estimation, our method demonstrates strong resilience to occlusions and camera perturbations. Additionally, we develop a 3D stereo system that drastically reduces setup complexity compared to existing solutions. Experiments on public and custom datasets show that PHOTONS outperforms state-of-the-art methods in both efficiency and visual quality.

AAAI Conference 2025 Conference Paper

Aligning Language Models Using Follow-up Likelihood as Reward Signal

  • Chen Zhang
  • Dading Chong
  • Feng Jiang
  • Chengguang Tang
  • Anningzhe Gao
  • Guohua Tang
  • Haizhou Li

In natural human-to-human conversations, participants often receive feedback signals from one another based on their follow-up reactions. These reactions can include verbal responses, facial expressions, changes in emotional state, and other non-verbal cues. Similarly, in human-machine interactions, the machine can leverage the user's follow-up utterances as feedback signals to assess whether it has appropriately addressed the user's request. Therefore, we propose using the likelihood of follow-up utterances as rewards to differentiate preferred responses from less favored ones, without relying on human or commercial LLM-based preference annotations. Our proposed reward mechanism, ``Follow-up Likelihood as Reward" (FLR), matches the performance of strong reward models trained on large-scale human or GPT-4 annotated data on 8 pairwise-preference and 4 rating-based benchmarks. Building upon the FLR mechanism, we propose to automatically mine preference data from the online generations of a base policy model. The preference data are subsequently used to boost the helpfulness of the base model through direct alignment from preference (DAP) methods, such as direct preference optimization (DPO). Lastly, we demonstrate that fine-tuning the language model that provides follow-up likelihood with natural language feedback significantly enhances FLR's performance on reward modeling benchmarks and effectiveness in aligning the base policy model's helpfulness.

IJCAI Conference 2025 Conference Paper

FreqLLM: Frequency-Aware Large Language Models for Time Series Forecasting

  • Shunnan Wang
  • Min Gao
  • Zongwei Wang
  • Yibing Bai
  • Feng Jiang
  • Guansong Pang

Large Language Models (LLMs) have recently shown promise in Time Series Forecasting (TSF) by effectively capturing intricate time-domain dependencies. However, our preliminary experiments reveal that standard LLM-based approaches often fail to capture global correlations, limiting predictive performance. We found that embedding frequency-domain signals smooths weight distributions and enhances structured correlations by clearly separating global trends (low-frequency components) from local variations (high-frequency components). Building on these insights, we propose FreqLLM, a novel framework that integrates frequency-domain semantic alignment into LLMs to refine prompts for improved time series analysis. By bridging the gap between frequency signals and textual embeddings, FreqLLM effectively captures multi-scale temporal patterns and provides more robust forecasting results. Extensive experiments on benchmark datasets demonstrate that FreqLLM outperforms state-of-the-art TSF methods in both accuracy and generalization. The code is available at https: //github. com/biya0105/FreqLLM.

AAAI Conference 2025 Conference Paper

GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction

  • Yuwei Miao
  • Yuzhi Guo
  • Hehuan Ma
  • Jingquan Yan
  • Feng Jiang
  • Rui Liao
  • Junzhou Huang

Exploring the functions of genes and gene products is crucial to a wide range of fields, including medical research, evolutionary biology, and environmental science. However, discovering new functions largely relies on expensive and exhaustive wet lab experiments. Existing methods of automatic function annotation or prediction mainly focus on protein function prediction with sequence, 3D-structures or protein family information. In this study, we propose to tackle the gene function prediction problem by exploring Gene Ontology graph and annotation with BERT (GoBERT) to decipher the underlying relationships among gene functions. Our proposed novel function prediction task utilizes existing functions as inputs and generalizes the function prediction to gene and gene products. Specifically, two pre-train tasks are designed to jointly train GoBERT to capture both explicit and implicit relations of functions. Neighborhood prediction is a self-supervised multi-label classification task that captures the explicit function relations. Specified masking and recovering task helps GoBERT in finding implicit patterns among functions. The pre-trained GoBERT possess the ability to predict novel functions for various gene and gene products based on known functional annotations. Extensive experiments, biological case studies, and ablation studies are conducted to demonstrate the superiority of our proposed GoBERT.

NeurIPS Conference 2025 Conference Paper

TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence

  • Feng Jiang
  • Mangal Prakash
  • Hehuan Ma
  • Jianyuan Deng
  • Yuzhi Guo
  • Maolaaisha Aminanmu
  • Tommaso Mansi
  • Rui Liao

Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations. To achieve this, we curate a comprehensive dataset of molecule-text pairs with structured, multi-level functional annotations. Instead of relying on conventional contrastive loss, TRIDENT employs a volume-based alignment objective to jointly align tri-modal features at the global level, enabling soft, geometry-aware alignment across modalities. Additionally, TRIDENT introduces a novel local alignment objective that captures detailed relationships between molecular substructures and their corresponding sub-textual descriptions. A momentum-based mechanism dynamically balances global and local alignment, enabling the model to learn both broad functional semantics and fine-grained structure-function mappings. TRIDENT achieves state-of-the-art performance on 18 downstream tasks, demonstrating the value of combining SMILES, textual, and taxonomic functional annotations for molecular property prediction. Our code and data are available at https: //github. com/uta-smile/TRIDENT.

JBHI Journal 2024 Journal Article

An Adaptively Weighted Averaging Method for Regional Time Series Extraction of fMRI-Based Brain Decoding

  • Jianfei Zhu
  • Baichun Wei
  • Jiaru Tian
  • Feng Jiang
  • Chunzhi Yi

Brain decoding that classifies cognitive states using the functional fluctuations of the brain can provide insightful information for understanding the brain mechanisms of cognitive functions. Among the common procedures of decoding the brain cognitive states with functional magnetic resonance imaging (fMRI), extracting the time series of each brain region after brain parcellation traditionally averages across the voxels within a brain region. This neglects the spatial information among the voxels and the requirement of extracting information for the downstream tasks. In this study, we propose to use a fully connected neural network that is jointly trained with the brain decoder to perform an adaptively weighted average across the voxels within each brain region. We perform extensive evaluations by cognitive state decoding, manifold learning, and interpretability analysis on the Human Connectome Project (HCP) dataset. The performance comparison of the cognitive state decoding presents an accuracy increase of up to 5% and stable accuracy improvement under different time window sizes, resampling sizes, and training data sizes. The results of manifold learning show that our method presents a considerable separability among cognitive states and basically excludes subject-specific information. The interpretability analysis shows that our method can identify reasonable brain regions corresponding to each cognitive state. Our study would aid the improvement of the basic pipeline of fMRI processing.

JBHI Journal 2024 Journal Article

Magnetoencephalography Decoding Transfer Approach: From Deep Learning Models to Intrinsically Interpretable Models

  • Yongdong Fan
  • Qiong Li
  • Haokun Mao
  • Feng Jiang

When decoding neuroelectrophysiological signals represented by Magnetoencephalography (MEG), deep learning models generally achieve high predictive performance but lack the ability to interpret their predicted results. This limitation prevents them from meeting the essential requirements of reliability and ethical-legal considerations in practical applications. In contrast, intrinsically interpretable models, such as decision trees, possess self-evident interpretability while typically sacrificing accuracy. To effectively combine the respective advantages of both deep learning and intrinsically interpretable models, an MEG transfer approach through feature attribution-based knowledge distillation is pioneered, which transforms deep models (teacher) into highly accurate intrinsically interpretable models (student). The resulting models provide not only intrinsic interpretability but also high predictive performance, besides serving as an excellent approximate proxy to understand the inner workings of deep models. In the proposed approach, post-hoc feature knowledge derived from post-hoc interpretable algorithms, specifically feature attribution maps, is introduced into knowledge distillation for the first time. By guiding intrinsically interpretable models to assimilate this knowledge, the transfer of MEG decoding information from deep models to intrinsically interpretable models is implemented. Experimental results demonstrate that the proposed approach outperforms the benchmark knowledge distillation algorithms. This approach successfully improves the prediction accuracy of Soft Decision Tree by a maximum of 8. 28%, reaching almost equivalent or even superior performance to deep teacher models. Furthermore, the model-agnostic nature of this approach offers broad application potential.

ICRA Conference 2023 Conference Paper

Multi-to-Single Knowledge Distillation for Point Cloud Semantic Segmentation

  • Shoumeng Qiu
  • Feng Jiang
  • Haiqiang Zhang
  • Xiangyang Xue 0001
  • Jian Pu

3D point cloud semantic segmentation is one of the fundamental tasks for environmental understanding. Although significant progress has been made in recent years, the performance of classes with few examples or few points is still far from satisfactory. In this paper, we propose a novel multi-to-single knowledge distillation framework for the 3D point cloud semantic segmentation task to boost the performance of those hard classes. Instead of fusing all the points of multi-scans directly, only the instances that belong to the previously defined hard classes are fused. To effectively and sufficiently distill valuable knowledge from multi-scans, we leverage a multilevel distillation framework, i. e. , feature representation distillation, logit distillation, and affinity distillation. We further develop a novel instance-aware affinity distillation algorithm for capturing high-level structural knowledge to enhance the distillation efficacy for hard classes. Finally, we conduct experiments on the SemanticKITTI dataset, and the results on both the validation and test sets demonstrate that our method yields substantial improvements compared with the baseline method. The code is available at https://github.com/skyshoumeng/M2SKD.

AAAI Conference 2021 Conference Paper

Hierarchical Macro Discourse Parsing Based on Topic Segmentation

  • Feng Jiang
  • Yaxin Fan
  • Xiaomin Chu
  • Peifeng Li
  • Qiaoming Zhu
  • Fang Kong

Hierarchically constructing micro (i. e. , intra-sentence or inter-sentence) discourse structure trees using explicit boundaries (e. g. , sentence and paragraph boundaries) has been proved to be an effective strategy. However, it is difficult to apply this strategy to document-level macro (i. e. , interparagraph) discourse parsing, the more challenging task, due to the lack of explicit boundaries at the higher level. To alleviate this issue, we introduce a topic segmentation mechanism to detect implicit topic boundaries and then help the document-level macro discourse parser to construct better discourse trees hierarchically. In particular, our parser first splits a document into several sections using the topic boundaries that the topic segmentation detects. Then it builds a smaller and more accurate discourse sub-tree in each section and sequentially forms a whole tree for a document. The experimental results on both Chinese MCDTB and English RST-DT show that our proposed method outperforms the state-of-the-art baselines significantly.

NeurIPS Conference 2021 Conference Paper

Residual Relaxation for Multi-view Representation Learning

  • Yifei Wang
  • Zhengyang Geng
  • Feng Jiang
  • Chuming Li
  • Yisen Wang
  • Jiansheng Yang
  • Zhouchen Lin

Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the exact alignment objective to better cultivate stronger augmentations. Taking image rotation as a case study, we develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment by allowing an adaptive residual vector between different views and encoding the semantic shift through pretext-aware learning. Extensive experiments on different backbones show that our method can not only improve multi-view methods with existing augmentations, but also benefit from stronger image augmentations like rotation.

AAAI Conference 2020 Conference Paper

Improving Entity Linking by Modeling Latent Entity Type Information

  • Shuang Chen
  • Jinpeng Wang
  • Feng Jiang
  • Chin-Yew Lin

Existing state of the art neural entity linking models employ attention-based bag-of-words context model and pre-trained entity embeddings bootstrapped from word embeddings to assess topic level context compatibility. However, the latent entity type information in the immediate context of the mention is neglected, which causes the models often link mentions to incorrect entities with incorrect type. To tackle this problem, we propose to inject latent entity type information into the entity embeddings based on pre-trained BERT. In addition, we integrate a BERT-based entity similarity score into the local context model of a state-of-the-art model to better capture latent entity type information. Our model significantly outperforms the state-of-the-art entity linking models on standard benchmark (AIDA-CoNLL). Detailed experiment analysis demonstrates that our model corrects most of the type errors produced by the direct baseline.

JMLR Journal 2015 Journal Article

Multi-layered Gesture Recognition with Kinect

  • Feng Jiang
  • Shengping Zhang
  • Shen Wu
  • Yang Gao
  • Debin Zhao

This paper proposes a novel multi-layered gesture recognition method with Kinect. We explore the essential linguistic characters of gestures: the components concurrent character and the sequential organization character, in a multi-layered framework, which extracts features from both the segmented semantic units and the whole gesture sequence and then sequentially classifies the motion, location and shape components. In the first layer, an improved principle motion is applied to model the motion component. In the second layer, a particle-based descriptor and a weighted dynamic time warping are proposed for the location component classification. In the last layer, the spatial path warping is further proposed to classify the shape component represented by unclosed shape context. The proposed method can obtain relatively high performance for one-shot learning gesture recognition on the ChaLearn Gesture Dataset comprising more than 50, 000 gesture sequences recorded with Kinect. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )