Arrow Research search

Author name cluster

Jing Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers
2 author rows

Possible papers

41

NeurIPS Conference 2025 Conference Paper

Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization

  • jingfeng Guo
  • Jian Liu
  • Jinnan Chen
  • Shiwei Mao
  • Changrong Hu
  • Puhua Jiang
  • Junlin Yu
  • Jing Xu

We introduce Auto-Connect, a novel approach for automatic rigging that explicitly preserves skeletal connectivity through a connectivity-preserving tokenization scheme. Unlike previous methods that predict bone positions represented as two joints or first predict points before determining connectivity, our method employs special tokens to define endpoints for each joint's children and for each hierarchical layer, effectively automating connectivity relationships. This approach significantly enhances topological accuracy by integrating connectivity information directly into the prediction framework. To further guarantee high-quality topology, we implement a topology-aware reward function that quantifies topological correctness, which is then utilized in a post-training phase through reward-guided Direct Preference Optimization. Additionally, we incorporate implicit geodesic features for latent top-$k$ bone selection, which substantially improves skinning quality. By leveraging geodesic distance information within the model's latent space, our approach intelligently determines the most influential bones for each vertex, effectively mitigating common skinning artifacts. This combination of connectivity-preserving tokenization, reward-guided fine-tuning, and geodesic-aware bone selection enables our model to consistently generate more anatomically plausible skeletal structures with superior deformation properties.

AAAI Conference 2025 Conference Paper

CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction

  • Rong Han
  • Xiaohong Liu
  • Tong Pan
  • Jing Xu
  • Xiaoyu Wang
  • Wuyang Lan
  • Zhenyu Li
  • Zixuan Wang

Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have shown strong representation ability for various in-domain downstream tasks, including binding site prediction. However, applying different-domain language models collaboratively for complex-level tasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trained language models from different biological domains via Complex structure for Protein-RNA binding Affinity prediction. We demonstrate for the first time that cross-biological modal language models can collaborate to improve binding affinity prediction. We propose a Co-Former to combine the cross-modal sequence and structure information and a bi-scope pre-training strategy for improving Co-Former's interaction understanding. Meanwhile, we build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation. We also test our model on a public dataset for mutation effect prediction. CoPRA reaches state-of-the-art performance on all the datasets. We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.

IJCAI Conference 2025 Conference Paper

Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis

  • Luan Zhang
  • Dandan Song
  • Zhijing Wu
  • Yuhang Tian
  • Changzhi Zhou
  • Jing Xu
  • Ziyi Yang
  • Shuhao Zhang

Large language models (LLMs) have shown exceptional performance across various domains. However, LLMs are prone to hallucinate facts and generate non-factual responses, which can undermine their reliability in real-world applications. Current hallucination detection methods suffer from external resource demands, substantial time overhead, difficulty overcoming LLMs' intrinsic limitation, and insufficient modeling. In this paper, we propose MHAD, a novel internal-representation-based hallucination detection method. MHAD utilizes linear probing to select neurons and layers within LLMs. The selected neurons and layers are demonstrated with significant awareness of hallucinations at the initial and final generation steps. By concatenating the outputs from these selected neurons of selected layers at the initial and final generation steps, a hallucination awareness vector is formed, enabling precise hallucination detection via an MLP. Additionally, we introduce SOQHD, a novel benchmark for evaluating hallucination detection in Open-Domain QA (ODQA). Extensive experiments show that MHAD outperforms existing hallucination detection methods across multiple LLMs, demonstrating superior effectiveness.

ICML Conference 2025 Conference Paper

Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs

  • Xun Wang
  • Jing Xu
  • Franziska Boenisch
  • Michael Backes 0001
  • Christopher A. Choquette-Choo
  • Adam Dziedzic

Prompting has become a dominant paradigm for adapting large language models (LLMs). While discrete (textual) prompts are widely used for their interpretability, soft (parameter) prompts have recently gained traction in APIs. This is because they can encode information from more training samples while minimizing the user’s token usage, leaving more space in the context window for task-specific input. However, soft prompts are tightly coupled to the LLM they are tuned on, limiting their generalization to other LLMs. This constraint is particularly problematic for efficiency and privacy: (1) tuning prompts on each LLM incurs high computational costs, especially as LLMs continue to grow in size. Additionally, (2) when the LLM is hosted externally, soft prompt tuning often requires sharing private data with the LLM provider. For instance, this is the case with the NVIDIA NeMo API. To address these issues, we propose POST ( P rivacy O f S oft prompt T ransfer), a framework that enables private tuning of soft prompts on a small model and subsequently transfers these prompts to a larger LLM. POST uses knowledge distillation to derive a small model directly from the large LLM to improve prompt transferability, tunes the soft prompt locally, optionally with differential privacy guarantees, and transfers it back to the larger LLM using a small public dataset. Our experiments show that POST reduces computational costs, preserves privacy, and effectively transfers high-utility soft prompts.

JBHI Journal 2025 Journal Article

Hypergraph–based Audio–Visual Fusion for Obstructive Sleep Apnea Severity Estimation During Wakefulness

  • Biao Xue
  • Yanting Shao
  • Zhichao Wang
  • Chang–Hong Fu
  • Xiaohua Zhu
  • Heng Zhao
  • Jing Xu
  • Hong Hong

Obstructive sleep apnea (OSA) is associated with psychophysiological impairments, and recent studies have shown the feasibility of using speech and craniofacial images during wakefulness for severity estimation. However, the inherent limitations of unimodal data constrain the performance of current methods. To address this, we proposed a novel hypergraph-based multimodal fusion framework (HMFusion) that integrates psychophysiological information from audio-visual data. Specifically, we employ long short-term memory (LSTM)-based encoders to extract modality-specific temporal dynamics from pre–trained audio-visual embeddings and remotely photoplethysmography (rPPG)–derived heart rate sequences. A hypergraph neural network is then utilized to capture critical cross-modal interactions for OSA severity estimation. Evaluation on a dataset of 159 participants from a clinical sleep center demonstrates that the proposed model achieves area under the receiver operating characteristic curves (AUCs) of 88. 26%, 86. 07%, and 85. 29%, with corresponding F1-scores of 92. 91%, 85. 50%, and 85. 30% at Apnea-Hypopnea Index (AHI) thresholds of 5, 15, and 30 events/hour, respectively, outperforming state-of-the-art approaches. This study highlights the potential of psychophysiological data in enhancing OSA severity estimation during wakefulness, offering new avenues for clinical research in this field.

JBHI Journal 2025 Journal Article

Knowledge Guided Articulatory and Spectrum Information Fusion for Obstructive Sleep Apnea Severity Estimation

  • Biao Xue
  • Zhichao Wang
  • Yanting Shao
  • Xiaohua Zhu
  • Heng Zhao
  • Chang–Hong Fu
  • Jing Xu
  • Ning Ding

Numerous studies have demonstrated that speech analysis during wakefulness is a non-invasive and convenient method for Obstructive sleep apnea (OSA) screening. However, the inherent differences in upper airway structure and function between wakefulness and sleep limit the effectiveness of OSA assessments based on vowels and phonemes employed in existing studies. To address this challenge, we propose the design of controlled articulations that more accurately simulate upper airway obstruction during sleep, offering a more comprehensive reflection of the pathological changes in upper airway anatomy and function in individuals with suspected OSA. Specifically, we constructed a Mandarin Chinese controlled articulation dataset, consisting of speech recordings from 301 male adult participants who underwent polysomnography (PSG) monitoring at a sleep center. Drawing on domain knowledge, we thoroughly investigated articulations associated with upper airway collapse, including vowels, pharyngeals, and nasals, and identified interpretable optimal articulations using SHapley Additive Explanations (SHAP). Furthermore, we introduced a dual-stream fusion model, PTF-Net, which employs the Paralinguistic Acoustic Feature stream (PAF-Stream) to extract the physical attributes of speech and the Transfer Learning-based Spectrogram Feature stream (TLE-Stream) to capture the nonlinear features of upper airway dynamics. The Swin Transformer is utilized to integrate both local and global information from various articulations. Experimental results demonstrate that the knowledge-guided PTF-Net model outperforms existing methods in the assessment of OSA severity. The knowledge-guided PTF-Net model outperforms existing methods by 5. 1% in Area Under the Curve (AUC) and 5. 8% in Unweighted Average Recall (UAR) for OSA severity assessment. In addition, we revealed that the proposed deep embedding of controlled articulation could differentiate between different types of obstruction sites identified by drug-induced sleep endoscopy (DISE), suggesting its potential as a novel digital biomarker for upper airway assessment in OSA patients. This study enhances the understanding of speech-based OSA screening and paves the way for its broad clinical application.

AAAI Conference 2025 Conference Paper

LLM Agents Can Be Choice-Supportive Biased Evaluators: An Empirical Study

  • Nan Zhuang
  • Boyu Cao
  • Yi Yang
  • Jing Xu
  • Mingda Xu
  • Yuxiao Wang
  • Qi Liu

With Large Language Model (LLM) agents taking on more evaluation responsibilities in decision-making, it is essential to recognize their possible biases to guarantee fair and trustworthy AI-supported decisions. This study is the first to thoroughly examine the choice-supportive bias in LLM agents, a cognitive bias that is known to impact human decision-making and evaluation. We conduct experiments across 19 open/unopen-source LLM models in five scenarios at maximum, employing both memory-based and evaluation-based tasks adapted and redesigned from human cognitive studies. Our findings show that LLM agents may exhibit biased attribution or evaluation that supports their initial choices, and such bias may persist even if contextual hallucination is not observable. Key findings show that bias manifestation can differ greatly depending on prompt construction and context preservation, and the bias may be mitigated in larger models. Significantly, we observe that the bias increases when the agents perceive they are in control. Our extensive study involving 284 well-educated humans shows that, despite bias, certain LLM agents can still perform better than humans in similar evaluation tasks. This research contributes to the growing area of AI psychology, and the findings underscore the importance of addressing cognitive biases in LLM Agent systems, with wide-ranging implications spanning from improving AI-assisted decision-making to advancing AI safety and ethics.

NeurIPS Conference 2025 Conference Paper

Memorization in Graph Neural Networks

  • Adarsh Jamadandi
  • Jing Xu
  • Adam Dziedzic
  • Franziska Boenisch

Deep neural networks (DNNs) have been shown to memorize their training data, but similar analyses for graph neural networks (GNNs) remain under-explored. We introduce NCMemo (Node Classification Memorization), the first framework to quantify label memorization in semi-supervised node classification. We establish an inverse relationship between memorization and graph homophily, i. e the tendency of connected nodes to share labels or features. Lower homophily significantly increases memorization, indicating that GNNs rely on label memorization when learning less homophilic graphs. We then analyze GNN training dynamics and find that increased memorization in low-homophily graphs is tightly coupled to GNNs' implicit bias toward using graph structure. When structure is less informative, models instead memorize node labels to minimize training loss. Finally, we show that nodes with higher label inconsistency in their feature-space neighborhood are more prone to memorization. Based on these insights, we investigate graph rewiring as a mitigation strategy. Our results show that rewiring reduces memorization without harming model performance, while also lowering the privacy risk for previously memorized data points. Thus, our work advances understanding of GNN learning and supports more privacy-preserving GNN deployment.

NeurIPS Conference 2025 Conference Paper

Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning

  • Jian Liu
  • Jing Xu
  • Song Guo
  • Jing Li
  • jingfeng Guo
  • Jiaao Yu
  • Haohan Weng
  • Biwen Lei

Existing pretrained models for 3D mesh generation often suffer from data biases and produce low-quality results, while global reinforcement learning (RL) methods rely on object-level rewards that struggle to capture local structure details. To address these challenges, we present $\textbf{Mesh-RFT}$, a novel fine-grained reinforcement fine-tuning framework that employs Masked Direct Preference Optimization (M-DPO) to enable localized refinement via quality-aware face masking. To facilitate efficient quality evaluation, we introduce an objective topology-aware scoring system to evaluate geometric integrity and topological regularity at both object and face levels through two metrics: Boundary Edge Ratio (BER) and Topology Score (TS). By integrating these metrics into a fine-grained RL strategy, Mesh-RFT becomes the first method to optimize mesh quality at the granularity of individual faces, resolving localized errors while preserving global coherence. Experiment results show that our M-DPO approach reduces Hausdorff Distance (HD) by 24. 6\% and improves Topology Score (TS) by 3. 8\% over pre-trained models, while outperforming global DPO methods with a 17. 4\% HD reduction and 4. 9\% TS gain. These results demonstrate Mesh-RFT’s ability to improve geometric integrity and topological regularity, achieving new state-of-the-art performance in production-ready mesh generation.

ICRA Conference 2025 Conference Paper

PRIDEV: A Plug-and-Play Refinement for Improved Depth Estimation in Videos

  • Jing Xu
  • Hong Liu 0008
  • Jianbing Wu
  • Xinhua Xu

Monocular video depth estimation is a key challenge in computer vision, highlighting its importance in visual understanding. Monocular depth estimation models trained on single images achieve impressive results on individual frames but often lack temporal consistency when applied to videos, leading to flickering and artifacts. Current video depth estimation methods often rely on additional optical flow or camera poses, which are limited by their accuracy, complex design, and lack robustness. Specially, we propose a plug-and-play method that seamlessly transfers the robustness of image depth estimation to video depth estimation. By leveraging powerful priors from image depth estimation, our method enhances the performance of video depth estimation without requiring additional conditional inputs or extensive pretraining on large and expensive video datasets. We introduce the Temporal Depth Stabilization Module (TDSM), which can seamlessly inflate an image monocular depth estimation model into a video depth estimation model, enabling unified modeling of depth across video sequences and capturing the temporal cues in video. We validate the effectiveness and efficiency of our method across various datasets (e. g. , normal and challenging conditions) and different backbones. Extensive experiments demonstrate that our simple and effective method significantly improves monocular depth estimation networks, achieving new state-of-the-art accuracy in both spatial and temporal dimensions.

JBHI Journal 2024 Journal Article

Automatically Extracting and Utilizing EEG Channel Importance Based on Graph Convolutional Network for Emotion Recognition

  • Kun Yang
  • Zhenning Yao
  • Keze Zhang
  • Jing Xu
  • Li Zhu
  • Shichao Cheng
  • Jianhai Zhang

Graph convolutional network (GCN) based on the brain network has been widely used for EEG emotion recognition. However, most studies train their models directly without considering network dimensionality reduction beforehand. In fact, some nodes and edges are invalid information or even interference information for the current task. It is necessary to reduce the network dimension and extract the core network. To address the problem of extracting and utilizing the core network, a core network extraction model (CWGCN) based on channel weighting and graph convolutional network and a graph convolutional network model (CCSR-GCN) based on channel convolution and style-based recalibration for emotion recognition have been proposed. The CWGCN model automatically extracts the core network and the channel importance parameter in a data-driven manner. The CCSR-GCN model innovatively uses the output information of the CWGCN model to identify the emotion state. The experimental results on SEED show that: 1) the core network extraction can help improve the performance of the GCN model; 2) the models of CWGCN and CCSR-GCN achieve better results than the currently popular methods. The idea and its implementation in this paper provide a novel and successful perspective for the application of GCN in brain network analysis of other specific tasks.

NeurIPS Conference 2024 Conference Paper

Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification

  • Junru Chen
  • Tianyu Cao
  • Jing Xu
  • Jiahe Li
  • Zhilong Chen
  • Tao Xiao
  • Yang Yang

Time Series Classification (TSC) encompasses two settings: classifying entire sequences or classifying segmented subsequences. The raw time series for segmented TSC usually contain Multiple classes with Varying Duration of each class (MVD). Therefore, the characteristics of MVD pose unique challenges for segmented TSC, yet have been largely overlooked by existing works. Specifically, there exists a natural temporal dependency between consecutive instances (segments) to be classified within MVD. However, mainstream TSC models rely on the assumption of independent and identically distributed (i. i. d. ), focusing on independently modeling each segment. Additionally, annotators with varying expertise may provide inconsistent boundary labels, leading to unstable performance of noise-free TSC models. To address these challenges, we first formally demonstrate that valuable contextual information enhances the discriminative power of classification instances. Leveraging the contextual priors of MVD at both the data and label levels, we propose a novel consistency learning framework Con4m, which effectively utilizes contextual information more conducive to discriminating consecutive segments in segmented TSC tasks, while harmonizing inconsistent boundary labels for training. Extensive experiments across multiple datasets validate the effectiveness of Con4m in handling segmented TSC tasks on MVD. The source code is available at https: //github. com/MrNobodyCali/Con4m.

JBHI Journal 2024 Journal Article

DSFE: Decoding EEG-Based Finger Motor Imagery Using Feature-Dependent Frequency, Feature Fusion and Ensemble Learning

  • Kun Yang
  • Ruochen Li
  • Jing Xu
  • Li Zhu
  • Wanzeng Kong
  • Jianhai Zhang

Accurate decoding finger motor imagery is essential for fine motor control using EEG signals. However, decoding finger motor imagery is particularly challenging compared with ordinary motor imagery. This paper proposed a novel EEG decoding method of feature-dependent frequency band selection, feature fusion, and ensemble learning (DSFE) for finger motor imagery. First, a feature-dependent frequency band selection method based on correlation coefficient (FDCC) was proposed to select feature-specific effective bands. Second, a feature fusion method was proposed to fuse different types of candidate features to produce multiple refined sets of decoding features. Finally, an ensemble model using the weighted voting strategy was proposed to make full use of these diverse sets of final features. The results on a public EEG dataset of five fingers motor imagery showed that the DSFE method is effective and achieves the highest decoding accuracy of 50. 64%, which is 7. 64% higher than existing studies using exactly the same data. The experiments further revealed that both the effective frequency bands of different subjects and the effective frequency bands of different types of features are different in finger motor imagery. Furthermore, compared with two-hand motor imagery, the effective decoding information of finger motor imagery is transferred to the lower frequency. The idea and findings in this paper provide a valuable perspective for understanding fine motor imagery in-depth.

NeurIPS Conference 2024 Conference Paper

Functionally Constrained Algorithm Solves Convex Simple Bilevel Problem

  • Huaqing Zhang
  • Lesi Chen
  • Jing Xu
  • Jingzhao Zhang

This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate solutions. For this goal, we propose a novel method by reformulating them into functionally constrained problems. Our method achieves near-optimal rates for both smooth and nonsmooth problems. To the best of our knowledge, this is the first near-optimal algorithm that works under standard assumptions of smoothness or Lipschitz continuity for the objective functions.

AAAI Conference 2024 Conference Paper

MultiSum: A Multi-Facet Approach for Extractive Social Summarization Utilizing Semantic and Sociological Relationships

  • Tanglong Zhao
  • Ruifang He
  • Jing Xu
  • Bo Wang

Social summarization aims to provide summaries for a large number of social texts (called posts) about a single topic. To extract a summary, both the representation of post and summary selection method are crucial. Previous methods introduce social relation to enhance post embedding to mitigate the sparse representation due to its brief and informal expression. However, they ignore that there are multiple relations between posts. Besides, existing graph-based centrality calculation approaches tend to select posts from one aspect. This leads to facet bias especially when there are multiple viewpoints. In this paper, we propose a model named MultiSum to improve social summarization. Specifically, 1) We use graph convolutional networks to fuse text content with social and semantic relations to improve post representation; 2) The similarity between the summary and all aspects is incorporated into the centrality score during the selection phase, encouraging the model to pay attention to different facets. Experimental results on English and Chinese corpora support the effectiveness of this model. Furthermore, external evaluations by human experts and large language models demonstrate the validity of MultiSum in facet coverage and redundancy reduction.

JBHI Journal 2024 Journal Article

Unsupervised Joint Domain Adaptation for Decoding Brain Cognitive States From tfMRI Images

  • Yameng Zhang
  • Yufei Gao
  • Jing Xu
  • Guohua Zhao
  • Lei Shi
  • Lingfei Kong

Recent advances in large model and neuroscience have enabled exploration of the mechanism of brain activity by using neuroimaging data. Brain decoding is one of the most promising researches to further understand the human cognitive function. However, current methods excessively depends on high-quality labeled data, which brings enormous expense of collection and annotation of neural images by experts. Besides, the performance of cross-individual decoding suffers from inconsistency in data distribution caused by individual variation and different collection equipments. To address mentioned above issues, a Join Domain Adapative Decoding (JDAD) framework is proposed for unsupervised decoding specific brain cognitive state related to behavioral task. Based on the volumetric feature extraction from task-based functional Magnetic Resonance Imaging (tfMRI) data, a novel objective loss function is designed by the combination of joint distribution regularizer, which aims to restrict the distance of both the conditional and marginal probability distribution of labeled and unlabeled samples. Experimental results on the public Human Connectome Project (HCP) S1200 dataset show that JDAD achieves superior performance than other prevalent methods, especially for fine-grained task with 11. 5%-21. 6% improvements of decoding accuracy. The learned 3D features are visualized by Grad-CAM to build a combination with brain functional regions, which provides a novel path to learn the function of brain cortex regions related to specific cognitive task in group level.

ICML Conference 2023 Conference Paper

A Closer Look at Few-shot Classification Again

  • Xu Luo 0003
  • Hao Wu 0070
  • Ji Zhang 0012
  • Lianli Gao
  • Jing Xu
  • Jingkuan Song

Few-shot classification consists of a training phase where a model is learned on a relatively large dataset and an adaptation phase where the learned model is adapted to previously-unseen tasks with limited labeled samples. In this paper, we empirically prove that the training algorithm and the adaptation algorithm can be completely disentangled, which allows algorithm analysis and design to be done individually for each phase. Our meta-analysis for each phase reveals several interesting insights that may help better understand key aspects of few-shot classification and connections with other fields such as visual representation learning and transfer learning. We hope the insights and research challenges revealed in this paper can inspire future work in related directions. Code and pre-trained models (in PyTorch) are available at https: //github. com/Frankluox/CloserLookAgainFewShot.

AAAI Conference 2023 Conference Paper

Dialogue State Distillation Network with Inter-slot Contrastive Learning for Dialogue State Tracking

  • Jing Xu
  • Dandan Song
  • Chong Liu
  • Siu Cheung Hui
  • Fei Li
  • Qiang Ju
  • Xiaonan He
  • Jian Xie

In task-oriented dialogue systems, Dialogue State Tracking (DST) aims to extract users' intentions from the dialogue history. Currently, most existing approaches suffer from error propagation and are unable to dynamically select relevant information when utilizing previous dialogue states. Moreover, the relations between the updates of different slots provide vital clues for DST. However, the existing approaches rely only on predefined graphs to indirectly capture the relations. In this paper, we propose a Dialogue State Distillation Network (DSDN) to utilize relevant information of previous dialogue states and migrate the gap of utilization between training and testing. Thus, it can dynamically exploit previous dialogue states and avoid introducing error propagation simultaneously. Further, we propose an inter-slot contrastive learning loss to effectively capture the slot co-update relations from dialogue context. Experiments are conducted on the widely used MultiWOZ 2.0 and MultiWOZ 2.1 datasets. The experimental results show that our proposed model achieves the state-of-the-art performance for DST.

NeurIPS Conference 2023 Conference Paper

Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression

  • Jing Xu
  • Jiaye Teng
  • Yang Yuan
  • Andrew Yao

One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demonstrate that the generalization behavior of overparameterized model should be analyzed in a both data-relevant and algorithm-relevant manner. To make a formal characterization, We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory, instead of traditional last-iterate analysis. We validate our claim by studying the setting of solving overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting. Our theoretical results demonstrate that if we take early stopping iterates into consideration, generalization can hold with significantly weaker restrictions on the problem instance than the previous last-iterate analysis.

NeurIPS Conference 2022 Conference Paper

Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid

  • Jing Xu
  • Xu Luo
  • Xinglin Pan
  • Yanan Li
  • Wenjie Pei
  • Zenglin Xu

Few-shot learning (FSL) targets at generalization of vision models towards unseen tasks without sufficient annotations. Despite the emergence of a number of few-shot learning methods, the sample selection bias problem, i. e. , the sensitivity to the limited amount of support data, has not been well understood. In this paper, we find that this problem usually occurs when the positions of support samples are in the vicinity of task centroid—the mean of all class centroids in the task. This motivates us to propose an extremely simple feature transformation to alleviate this problem, dubbed Task Centroid Projection Removing (TCPR). TCPR is applied directly to all image features in a given task, aiming at removing the dimension of features along the direction of the task centroid. While the exact task centoid cannot be accurately obtained from limited data, we estimate it using base features that are each similar to one of the support features. Our method effectively prevents features from being too close to the task centroid. Extensive experiments over ten datasets from different domains show that TCPR can reliably improve classification accuracy across various feature extractors, training algorithms and datasets. The code has been made available at https: //github. com/KikimorMay/FSL-TCBR.

ICML Conference 2022 Conference Paper

Channel Importance Matters in Few-Shot Image Classification

  • Xu Luo 0003
  • Jing Xu
  • Zenglin Xu

Few-Shot Learning (FSL) requires vision models to quickly adapt to brand-new classification tasks with a shift in task distribution. Understanding the difficulties posed by this task distribution shift is central to FSL. In this paper, we show that a simple channel-wise feature transformation may be the key to unraveling this secret from a channel perspective. When facing novel few-shot tasks in the test-time datasets, this transformation can greatly improve the generalization ability of learned image representations, while being agnostic to the choice of datasets and training algorithms. Through an in-depth analysis of this transformation, we find that the difficulty of representation transfer in FSL stems from the severe channel bias problem of image representations: channels may have different importance in different tasks, while convolutional neural networks are likely to be insensitive, or respond incorrectly to such a shift. This points out a core problem of the generalization ability of modern vision systems which needs further attention in the future.

ICLR Conference 2022 Conference Paper

ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind

  • Yuanfei Wang
  • Fangwei Zhong
  • Jing Xu
  • Yizhou Wang 0001

Being able to predict the mental states of others is a key factor to effective social interaction. It is also crucial for distributed multi-agent systems, where agents are required to communicate and cooperate. In this paper, we introduce such an important social-cognitive skill, i.e. Theory of Mind (ToM), to build socially intelligent agents who are able to communicate and cooperate effectively to accomplish challenging tasks. With ToM, each agent is capable of inferring the mental states and intentions of others according to its (local) observation. Based on the inferred states, the agents decide "when'' and with "whom'' to share their intentions. With the information observed, inferred, and received, the agents decide their sub-goals and reach a consensus among the team. In the end, the low-level executors independently take primitive actions to accomplish the sub-goals. We demonstrate the idea in two typical target-oriented multi-agent tasks: cooperative navigation and multi-sensor target coverage. The experiments show that the proposed model not only outperforms the state-of-the-art methods on reward and communication efficiency, but also shows good generalization across different scales of the environment.

AAAI Conference 2021 Conference Paper

MiniSeg: An Extremely Minimum Network for Efficient COVID-19 Segmentation

  • Yu Qiu
  • Yun Liu
  • Shijie Li
  • Jing Xu

The rapid spread of the new pandemic, i. e. , COVID-19, has severely threatened global health. Deep-learning-based computer-aided screening, e. g. , COVID-19 infected CT area segmentation, has attracted much attention. However, the publicly available COVID-19 training data are limited, easily causing overfitting for traditional deep learning methods that are usually data-hungry with millions of parameters. On the other hand, fast training/testing and low computational cost are also necessary for quick deployment and development of COVID-19 screening systems, but traditional deep learning methods are usually computationally intensive. To address the above problems, we propose MiniSeg, a lightweight deep learning model for efficient COVID-19 segmentation. Compared with traditional segmentation methods, MiniSeg has several significant strengths: i) it only has 83K parameters and is thus not easy to overfit; ii) it has high computational efficiency and is thus convenient for practical deployment; iii) it can be fast retrained by other users using their private COVID-19 data for further improving performance. In addition, we build a comprehensive COVID-19 segmentation benchmark for comparing MiniSeg to traditional methods.

NeurIPS Conference 2020 Conference Paper

Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks

  • Jing Xu
  • Fangwei Zhong
  • Yizhou Wang

Maximum target coverage by adjusting the orientation of distributed sensors is an important problem in directional sensor networks (DSNs). This problem is challenging as the targets usually move randomly but the coverage range of sensors is limited in angle and distance. Thus, it is required to coordinate sensors to get ideal target coverage with low power consumption, e. g. no missing targets or reducing redundant coverage. To realize this, we propose a Hierarchical Target-oriented Multi-Agent Coordination (HiT-MAC), which decomposes the target coverage problem into two-level tasks: targets assignment by a coordinator and tracking assigned targets by executors. Specifically, the coordinator periodically monitors the environment globally and allocates targets to each executor. In turn, the executor only needs to track its assigned targets. To effectively learn the HiT-MAC by reinforcement learning, we further introduce a bunch of practical methods, including a self-attention module, marginal contribution approximation for the coordinator, goal-conditional observation filter for the executor, etc. Empirical results demonstrate the advantage of HiT-MAC in coverage rate, learning efficiency, and scalability, comparing to baselines. We also conduct an ablative analysis on the effectiveness of the introduced components in the framework.

AAAI Conference 2020 Conference Paper

Pose-Assisted Multi-Camera Collaboration for Active Object Tracking

  • Jing Li
  • Jing Xu
  • Fangwei Zhong
  • Xiangyu Kong
  • Yu Qiao
  • Yizhou Wang

Active Object Tracking (AOT) is crucial to many visionbased applications, e. g. , mobile robot, intelligent surveillance. However, there are a number of challenges when deploying active tracking in complex scenarios, e. g. , target is frequently occluded by obstacles. In this paper, we extend the single-camera AOT to a multi-camera setting, where cameras tracking a target in a collaborative fashion. To achieve effective collaboration among cameras, we propose a novel Pose- Assisted Multi-Camera Collaboration System, which enables a camera to cooperate with the others by sharing camera poses for active object tracking. In the system, each camera is equipped with two controllers and a switcher: The vision-based controller tracks targets based on observed images. The pose-based controller moves the camera in accordance to the poses of the other cameras. At each step, the switcher decides which action to take from the two controllers according to the visibility of the target. The experimental results demonstrate that our system outperforms all the baselines and is capable of generalizing to unseen environments. The code and demo videos are available on our website https: //sites. google. com/view/pose-assistedcollaboration.

AAAI Conference 2019 Conference Paper

Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition

  • Yu Pan
  • Jing Xu
  • Maolin Wang
  • Jinmian Ye
  • Fei Wang
  • Kun Bai
  • Zenglin Xu

Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs very difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor-train LSTM and other state-of-the-art competitors.

JMLR Journal 2010 Journal Article

Continuous Time Bayesian Network Reasoning and Learning Engine

  • Christian R. Shelton
  • Yu Fan
  • William Lam
  • Joon Lee
  • Jing Xu

We present a continuous time Bayesian network reasoning and learning engine (CTBN-RLE). A continuous time Bayesian network (CTBN) provides a compact (factored) description of a continuous-time Markov process. This software provides libraries and programs for most of the algorithms developed for CTBNs. For learning, CTBN-RLE implements structure and parameter learning for both complete and partial data. For inference, it implements exact inference and Gibbs and importance sampling approximate inference for any type of evidence pattern. Additionally, the library supplies visualization methods for graphically displaying CTBNs or trajectories of evidence. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2010. ( edit, beta )

JMLR Journal 2010 Journal Article

Importance Sampling for Continuous Time Bayesian Networks

  • Yu Fan
  • Jing Xu
  • Christian R. Shelton

A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a finite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we first present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle filtering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people's life history events. We show the accuracy as well as the time efficiency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )

NeurIPS Conference 2008 Conference Paper

How memory biases affect information transmission: A rational analysis of serial reproduction

  • Jing Xu
  • Thomas Griffiths

Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in “serial reproduction” of information, in which one person’s reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases.