Author name cluster

Jing Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers

2 author rows

AIIM Journal 2026 Journal Article

B2E-CDG: Conditional diffusion-based for label-free OCT angiography artifact removal and robust vascular reconstruction

Jing Xu
Suzhong Fu
Jiwei Xing
Linyan Xue
Qingliang Zhao

Details DOI

EAAI Journal 2026 Journal Article

Children’s psychological recognition with a multimodal language model incorporating visual language features

Yao-Dong Chen
Jia Li
Jing Xu

Details DOI

NeurIPS Conference 2025 Conference Paper

Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization

jingfeng Guo
Jian Liu
Jinnan Chen
Shiwei Mao
Changrong Hu
Puhua Jiang
Junlin Yu
Jing Xu

We introduce Auto-Connect, a novel approach for automatic rigging that explicitly preserves skeletal connectivity through a connectivity-preserving tokenization scheme. Unlike previous methods that predict bone positions represented as two joints or first predict points before determining connectivity, our method employs special tokens to define endpoints for each joint's children and for each hierarchical layer, effectively automating connectivity relationships. This approach significantly enhances topological accuracy by integrating connectivity information directly into the prediction framework. To further guarantee high-quality topology, we implement a topology-aware reward function that quantifies topological correctness, which is then utilized in a post-training phase through reward-guided Direct Preference Optimization. Additionally, we incorporate implicit geodesic features for latent top-$k$ bone selection, which substantially improves skinning quality. By leveraging geodesic distance information within the model's latent space, our approach intelligently determines the most influential bones for each vertex, effectively mitigating common skinning artifacts. This combination of connectivity-preserving tokenization, reward-guided fine-tuning, and geodesic-aware bone selection enables our model to consistently generate more anatomically plausible skeletal structures with superior deformation properties.

PDF Details

AAAI Conference 2025 Conference Paper

CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction

Rong Han
Xiaohong Liu
Tong Pan
Jing Xu
Xiaoyu Wang
Wuyang Lan
Zhenyu Li
Zixuan Wang

Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have shown strong representation ability for various in-domain downstream tasks, including binding site prediction. However, applying different-domain language models collaboratively for complex-level tasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trained language models from different biological domains via Complex structure for Protein-RNA binding Affinity prediction. We demonstrate for the first time that cross-biological modal language models can collaborate to improve binding affinity prediction. We propose a Co-Former to combine the cross-modal sequence and structure information and a bi-scope pre-training strategy for improving Co-Former's interaction understanding. Meanwhile, we build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation. We also test our model on a public dataset for mutation effect prediction. CoPRA reaches state-of-the-art performance on all the datasets. We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis

Luan Zhang
Dandan Song
Zhijing Wu
Yuhang Tian
Changzhi Zhou
Jing Xu
Ziyi Yang
Shuhao Zhang

Large language models (LLMs) have shown exceptional performance across various domains. However, LLMs are prone to hallucinate facts and generate non-factual responses, which can undermine their reliability in real-world applications. Current hallucination detection methods suffer from external resource demands, substantial time overhead, difficulty overcoming LLMs' intrinsic limitation, and insufficient modeling. In this paper, we propose MHAD, a novel internal-representation-based hallucination detection method. MHAD utilizes linear probing to select neurons and layers within LLMs. The selected neurons and layers are demonstrated with significant awareness of hallucinations at the initial and final generation steps. By concatenating the outputs from these selected neurons of selected layers at the initial and final generation steps, a hallucination awareness vector is formed, enabling precise hallucination detection via an MLP. Additionally, we introduce SOQHD, a novel benchmark for evaluating hallucination detection in Open-Domain QA (ODQA). Extensive experiments show that MHAD outperforms existing hallucination detection methods across multiple LLMs, demonstrating superior effectiveness.

PDF Details DOI

ICML Conference 2025 Conference Paper

Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs

Xun Wang
Jing Xu
Franziska Boenisch
Michael Backes 0001
Christopher A. Choquette-Choo
Adam Dziedzic

Prompting has become a dominant paradigm for adapting large language models (LLMs). While discrete (textual) prompts are widely used for their interpretability, soft (parameter) prompts have recently gained traction in APIs. This is because they can encode information from more training samples while minimizing the user’s token usage, leaving more space in the context window for task-specific input. However, soft prompts are tightly coupled to the LLM they are tuned on, limiting their generalization to other LLMs. This constraint is particularly problematic for efficiency and privacy: (1) tuning prompts on each LLM incurs high computational costs, especially as LLMs continue to grow in size. Additionally, (2) when the LLM is hosted externally, soft prompt tuning often requires sharing private data with the LLM provider. For instance, this is the case with the NVIDIA NeMo API. To address these issues, we propose POST ( P rivacy O f S oft prompt T ransfer), a framework that enables private tuning of soft prompts on a small model and subsequently transfers these prompts to a larger LLM. POST uses knowledge distillation to derive a small model directly from the large LLM to improve prompt transferability, tunes the soft prompt locally, optionally with differential privacy guarantees, and transfers it back to the larger LLM using a small public dataset. Our experiments show that POST reduces computational costs, preserves privacy, and effectively transfers high-utility soft prompts.

Details

JBHI Journal 2025 Journal Article

Hypergraph–based Audio–Visual Fusion for Obstructive Sleep Apnea Severity Estimation During Wakefulness

Biao Xue
Yanting Shao
Zhichao Wang
Chang–Hong Fu
Xiaohua Zhu
Heng Zhao
Jing Xu
Hong Hong

Obstructive sleep apnea (OSA) is associated with psychophysiological impairments, and recent studies have shown the feasibility of using speech and craniofacial images during wakefulness for severity estimation. However, the inherent limitations of unimodal data constrain the performance of current methods. To address this, we proposed a novel hypergraph-based multimodal fusion framework (HMFusion) that integrates psychophysiological information from audio-visual data. Specifically, we employ long short-term memory (LSTM)-based encoders to extract modality-specific temporal dynamics from pre–trained audio-visual embeddings and remotely photoplethysmography (rPPG)–derived heart rate sequences. A hypergraph neural network is then utilized to capture critical cross-modal interactions for OSA severity estimation. Evaluation on a dataset of 159 participants from a clinical sleep center demonstrates that the proposed model achieves area under the receiver operating characteristic curves (AUCs) of 88. 26%, 86. 07%, and 85. 29%, with corresponding F1-scores of 92. 91%, 85. 50%, and 85. 30% at Apnea-Hypopnea Index (AHI) thresholds of 5, 15, and 30 events/hour, respectively, outperforming state-of-the-art approaches. This study highlights the potential of psychophysiological data in enhancing OSA severity estimation during wakefulness, offering new avenues for clinical research in this field.

Details DOI

YNIMG Journal 2025 Journal Article

Increased spindle-related brain activation in right middle temporal gyrus during N2 than N3 among healthy sleepers: Initial discovery and independent sample replication

Yan Shao
Yupeng Guo
Yun Chen
Guangyuan Zou
Jie Chen
Xuejiao Gao
Panpan Lu
Yujie Tong

Details DOI

JBHI Journal 2025 Journal Article

Knowledge Guided Articulatory and Spectrum Information Fusion for Obstructive Sleep Apnea Severity Estimation

Biao Xue
Zhichao Wang
Yanting Shao
Xiaohua Zhu
Heng Zhao
Chang–Hong Fu
Jing Xu
Ning Ding

Numerous studies have demonstrated that speech analysis during wakefulness is a non-invasive and convenient method for Obstructive sleep apnea (OSA) screening. However, the inherent differences in upper airway structure and function between wakefulness and sleep limit the effectiveness of OSA assessments based on vowels and phonemes employed in existing studies. To address this challenge, we propose the design of controlled articulations that more accurately simulate upper airway obstruction during sleep, offering a more comprehensive reflection of the pathological changes in upper airway anatomy and function in individuals with suspected OSA. Specifically, we constructed a Mandarin Chinese controlled articulation dataset, consisting of speech recordings from 301 male adult participants who underwent polysomnography (PSG) monitoring at a sleep center. Drawing on domain knowledge, we thoroughly investigated articulations associated with upper airway collapse, including vowels, pharyngeals, and nasals, and identified interpretable optimal articulations using SHapley Additive Explanations (SHAP). Furthermore, we introduced a dual-stream fusion model, PTF-Net, which employs the Paralinguistic Acoustic Feature stream (PAF-Stream) to extract the physical attributes of speech and the Transfer Learning-based Spectrogram Feature stream (TLE-Stream) to capture the nonlinear features of upper airway dynamics. The Swin Transformer is utilized to integrate both local and global information from various articulations. Experimental results demonstrate that the knowledge-guided PTF-Net model outperforms existing methods in the assessment of OSA severity. The knowledge-guided PTF-Net model outperforms existing methods by 5. 1% in Area Under the Curve (AUC) and 5. 8% in Unweighted Average Recall (UAR) for OSA severity assessment. In addition, we revealed that the proposed deep embedding of controlled articulation could differentiate between different types of obstruction sites identified by drug-induced sleep endoscopy (DISE), suggesting its potential as a novel digital biomarker for upper airway assessment in OSA patients. This study enhances the understanding of speech-based OSA screening and paves the way for its broad clinical application.

Details DOI

EAAI Journal 2025 Journal Article

Large coordinate attention network for lightweight image super-resolution

Fangwei Hao
Jiesheng Wu
Haotian Lu
Ji Du
Jing Xu
Xiaoxuan Xu

Details DOI

AAAI Conference 2025 Conference Paper

LLM Agents Can Be Choice-Supportive Biased Evaluators: An Empirical Study

Nan Zhuang
Boyu Cao
Yi Yang
Jing Xu
Mingda Xu
Yuxiao Wang
Qi Liu

With Large Language Model (LLM) agents taking on more evaluation responsibilities in decision-making, it is essential to recognize their possible biases to guarantee fair and trustworthy AI-supported decisions. This study is the first to thoroughly examine the choice-supportive bias in LLM agents, a cognitive bias that is known to impact human decision-making and evaluation. We conduct experiments across 19 open/unopen-source LLM models in five scenarios at maximum, employing both memory-based and evaluation-based tasks adapted and redesigned from human cognitive studies. Our findings show that LLM agents may exhibit biased attribution or evaluation that supports their initial choices, and such bias may persist even if contextual hallucination is not observable. Key findings show that bias manifestation can differ greatly depending on prompt construction and context preservation, and the bias may be mitigated in larger models. Significantly, we observe that the bias increases when the agents perceive they are in control. Our extensive study involving 284 well-educated humans shows that, despite bias, certain LLM agents can still perform better than humans in similar evaluation tasks. This research contributes to the growing area of AI psychology, and the findings underscore the importance of addressing cognitive biases in LLM Agent systems, with wide-ranging implications spanning from improving AI-assisted decision-making to advancing AI safety and ethics.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Memorization in Graph Neural Networks

Adarsh Jamadandi
Jing Xu
Adam Dziedzic
Franziska Boenisch

Deep neural networks (DNNs) have been shown to memorize their training data, but similar analyses for graph neural networks (GNNs) remain under-explored. We introduce NCMemo (Node Classification Memorization), the first framework to quantify label memorization in semi-supervised node classification. We establish an inverse relationship between memorization and graph homophily, i. e the tendency of connected nodes to share labels or features. Lower homophily significantly increases memorization, indicating that GNNs rely on label memorization when learning less homophilic graphs. We then analyze GNN training dynamics and find that increased memorization in low-homophily graphs is tightly coupled to GNNs' implicit bias toward using graph structure. When structure is less informative, models instead memorize node labels to minimize training loss. Finally, we show that nodes with higher label inconsistency in their feature-space neighborhood are more prone to memorization. Based on these insights, we investigate graph rewiring as a mitigation strategy. Our results show that rewiring reduces memorization without harming model performance, while also lowering the privacy risk for previously memorized data points. Thus, our work advances understanding of GNN learning and supports more privacy-preserving GNN deployment.

PDF Details

NeurIPS Conference 2025 Conference Paper

Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning

Jian Liu
Jing Xu
Song Guo
Jing Li
jingfeng Guo
Jiaao Yu
Haohan Weng
Biwen Lei

Existing pretrained models for 3D mesh generation often suffer from data biases and produce low-quality results, while global reinforcement learning (RL) methods rely on object-level rewards that struggle to capture local structure details. To address these challenges, we present $\textbf{Mesh-RFT}$, a novel fine-grained reinforcement fine-tuning framework that employs Masked Direct Preference Optimization (M-DPO) to enable localized refinement via quality-aware face masking. To facilitate efficient quality evaluation, we introduce an objective topology-aware scoring system to evaluate geometric integrity and topological regularity at both object and face levels through two metrics: Boundary Edge Ratio (BER) and Topology Score (TS). By integrating these metrics into a fine-grained RL strategy, Mesh-RFT becomes the first method to optimize mesh quality at the granularity of individual faces, resolving localized errors while preserving global coherence. Experiment results show that our M-DPO approach reduces Hausdorff Distance (HD) by 24. 6\% and improves Topology Score (TS) by 3. 8\% over pre-trained models, while outperforming global DPO methods with a 17. 4\% HD reduction and 4. 9\% TS gain. These results demonstrate Mesh-RFT’s ability to improve geometric integrity and topological regularity, achieving new state-of-the-art performance in production-ready mesh generation.

PDF Details

ICRA Conference 2025 Conference Paper

PRIDEV: A Plug-and-Play Refinement for Improved Depth Estimation in Videos

Jing Xu
Hong Liu 0008
Jianbing Wu
Xinhua Xu

Monocular video depth estimation is a key challenge in computer vision, highlighting its importance in visual understanding. Monocular depth estimation models trained on single images achieve impressive results on individual frames but often lack temporal consistency when applied to videos, leading to flickering and artifacts. Current video depth estimation methods often rely on additional optical flow or camera poses, which are limited by their accuracy, complex design, and lack robustness. Specially, we propose a plug-and-play method that seamlessly transfers the robustness of image depth estimation to video depth estimation. By leveraging powerful priors from image depth estimation, our method enhances the performance of video depth estimation without requiring additional conditional inputs or extensive pretraining on large and expensive video datasets. We introduce the Temporal Depth Stabilization Module (TDSM), which can seamlessly inflate an image monocular depth estimation model into a video depth estimation model, enabling unified modeling of depth across video sequences and capturing the temporal cues in video. We validate the effectiveness and efficiency of our method across various datasets (e. g. , normal and challenging conditions) and different backbones. Extensive experiments demonstrate that our simple and effective method significantly improves monocular depth estimation networks, achieving new state-of-the-art accuracy in both spatial and temporal dimensions.

Details

JBHI Journal 2024 Journal Article

Automatically Extracting and Utilizing EEG Channel Importance Based on Graph Convolutional Network for Emotion Recognition

Kun Yang
Zhenning Yao
Keze Zhang
Jing Xu
Li Zhu
Shichao Cheng
Jianhai Zhang

Graph convolutional network (GCN) based on the brain network has been widely used for EEG emotion recognition. However, most studies train their models directly without considering network dimensionality reduction beforehand. In fact, some nodes and edges are invalid information or even interference information for the current task. It is necessary to reduce the network dimension and extract the core network. To address the problem of extracting and utilizing the core network, a core network extraction model (CWGCN) based on channel weighting and graph convolutional network and a graph convolutional network model (CCSR-GCN) based on channel convolution and style-based recalibration for emotion recognition have been proposed. The CWGCN model automatically extracts the core network and the channel importance parameter in a data-driven manner. The CCSR-GCN model innovatively uses the output information of the CWGCN model to identify the emotion state. The experimental results on SEED show that: 1) the core network extraction can help improve the performance of the GCN model; 2) the models of CWGCN and CCSR-GCN achieve better results than the currently popular methods. The idea and its implementation in this paper provide a novel and successful perspective for the application of GCN in brain network analysis of other specific tasks.

Details DOI

NeurIPS Conference 2024 Conference Paper

Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification

Junru Chen
Tianyu Cao
Jing Xu
Jiahe Li
Zhilong Chen
Tao Xiao
Yang Yang

Time Series Classification (TSC) encompasses two settings: classifying entire sequences or classifying segmented subsequences. The raw time series for segmented TSC usually contain Multiple classes with Varying Duration of each class (MVD). Therefore, the characteristics of MVD pose unique challenges for segmented TSC, yet have been largely overlooked by existing works. Specifically, there exists a natural temporal dependency between consecutive instances (segments) to be classified within MVD. However, mainstream TSC models rely on the assumption of independent and identically distributed (i. i. d. ), focusing on independently modeling each segment. Additionally, annotators with varying expertise may provide inconsistent boundary labels, leading to unstable performance of noise-free TSC models. To address these challenges, we first formally demonstrate that valuable contextual information enhances the discriminative power of classification instances. Leveraging the contextual priors of MVD at both the data and label levels, we propose a novel consistency learning framework Con4m, which effectively utilizes contextual information more conducive to discriminating consecutive segments in segmented TSC tasks, while harmonizing inconsistent boundary labels for training. Extensive experiments across multiple datasets validate the effectiveness of Con4m in handling segmented TSC tasks on MVD. The source code is available at https: //github. com/MrNobodyCali/Con4m.

PDF Details DOI

JBHI Journal 2024 Journal Article

DSFE: Decoding EEG-Based Finger Motor Imagery Using Feature-Dependent Frequency, Feature Fusion and Ensemble Learning

Kun Yang
Ruochen Li
Jing Xu
Li Zhu
Wanzeng Kong
Jianhai Zhang

Accurate decoding finger motor imagery is essential for fine motor control using EEG signals. However, decoding finger motor imagery is particularly challenging compared with ordinary motor imagery. This paper proposed a novel EEG decoding method of feature-dependent frequency band selection, feature fusion, and ensemble learning (DSFE) for finger motor imagery. First, a feature-dependent frequency band selection method based on correlation coefficient (FDCC) was proposed to select feature-specific effective bands. Second, a feature fusion method was proposed to fuse different types of candidate features to produce multiple refined sets of decoding features. Finally, an ensemble model using the weighted voting strategy was proposed to make full use of these diverse sets of final features. The results on a public EEG dataset of five fingers motor imagery showed that the DSFE method is effective and achieves the highest decoding accuracy of 50. 64%, which is 7. 64% higher than existing studies using exactly the same data. The experiments further revealed that both the effective frequency bands of different subjects and the effective frequency bands of different types of features are different in finger motor imagery. Furthermore, compared with two-hand motor imagery, the effective decoding information of finger motor imagery is transferred to the lower frequency. The idea and findings in this paper provide a valuable perspective for understanding fine motor imagery in-depth.

Details DOI

NeurIPS Conference 2024 Conference Paper

Functionally Constrained Algorithm Solves Convex Simple Bilevel Problem

Huaqing Zhang
Lesi Chen
Jing Xu
Jingzhao Zhang

This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate solutions. For this goal, we propose a novel method by reformulating them into functionally constrained problems. Our method achieves near-optimal rates for both smooth and nonsmooth problems. To the best of our knowledge, this is the first near-optimal algorithm that works under standard assumptions of smoothness or Lipschitz continuity for the objective functions.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Morning resting hypothalamus-dorsal striatum connectivity predicts individual differences in diurnal sleepiness accumulation

Tianxin Mao
Bowen Guo
Peng Quan
Yao Deng
Ya Chai
Jing Xu
Caihong Jiang
Qingyun Zhang

Details DOI

AAAI Conference 2024 Conference Paper

MultiSum: A Multi-Facet Approach for Extractive Social Summarization Utilizing Semantic and Sociological Relationships

Tanglong Zhao
Ruifang He
Jing Xu
Bo Wang

Social summarization aims to provide summaries for a large number of social texts (called posts) about a single topic. To extract a summary, both the representation of post and summary selection method are crucial. Previous methods introduce social relation to enhance post embedding to mitigate the sparse representation due to its brief and informal expression. However, they ignore that there are multiple relations between posts. Besides, existing graph-based centrality calculation approaches tend to select posts from one aspect. This leads to facet bias especially when there are multiple viewpoints. In this paper, we propose a model named MultiSum to improve social summarization. Specifically, 1) We use graph convolutional networks to fuse text content with social and semantic relations to improve post representation; 2) The similarity between the summary and all aspects is incorporated into the centrality score during the selection phase, encouraging the model to pay attention to different facets. Experimental results on English and Chinese corpora support the effectiveness of this model. Furthermore, external evaluations by human experts and large language models demonstrate the validity of MultiSum in facet coverage and redundancy reduction.

PDF Details DOI

JBHI Journal 2024 Journal Article

Unsupervised Joint Domain Adaptation for Decoding Brain Cognitive States From tfMRI Images

Yameng Zhang
Yufei Gao
Jing Xu
Guohua Zhao
Lei Shi
Lingfei Kong

Recent advances in large model and neuroscience have enabled exploration of the mechanism of brain activity by using neuroimaging data. Brain decoding is one of the most promising researches to further understand the human cognitive function. However, current methods excessively depends on high-quality labeled data, which brings enormous expense of collection and annotation of neural images by experts. Besides, the performance of cross-individual decoding suffers from inconsistency in data distribution caused by individual variation and different collection equipments. To address mentioned above issues, a Join Domain Adapative Decoding (JDAD) framework is proposed for unsupervised decoding specific brain cognitive state related to behavioral task. Based on the volumetric feature extraction from task-based functional Magnetic Resonance Imaging (tfMRI) data, a novel objective loss function is designed by the combination of joint distribution regularizer, which aims to restrict the distance of both the conditional and marginal probability distribution of labeled and unlabeled samples. Experimental results on the public Human Connectome Project (HCP) S1200 dataset show that JDAD achieves superior performance than other prevalent methods, especially for fine-grained task with 11. 5%-21. 6% improvements of decoding accuracy. The learned 3D features are visualized by Grad-CAM to build a combination with brain functional regions, which provides a novel path to learn the function of brain cortex regions related to specific cognitive task in group level.

Details DOI

ICML Conference 2023 Conference Paper

A Closer Look at Few-shot Classification Again

Xu Luo 0003
Hao Wu 0070
Ji Zhang 0012
Lianli Gao
Jing Xu
Jingkuan Song

Few-shot classification consists of a training phase where a model is learned on a relatively large dataset and an adaptation phase where the learned model is adapted to previously-unseen tasks with limited labeled samples. In this paper, we empirically prove that the training algorithm and the adaptation algorithm can be completely disentangled, which allows algorithm analysis and design to be done individually for each phase. Our meta-analysis for each phase reveals several interesting insights that may help better understand key aspects of few-shot classification and connections with other fields such as visual representation learning and transfer learning. We hope the insights and research challenges revealed in this paper can inspire future work in related directions. Code and pre-trained models (in PyTorch) are available at https: //github. com/Frankluox/CloserLookAgainFewShot.

Details

AAAI Conference 2023 Conference Paper

Dialogue State Distillation Network with Inter-slot Contrastive Learning for Dialogue State Tracking

Jing Xu
Dandan Song
Chong Liu
Siu Cheung Hui
Fei Li
Qiang Ju
Xiaonan He
Jian Xie

In task-oriented dialogue systems, Dialogue State Tracking (DST) aims to extract users' intentions from the dialogue history. Currently, most existing approaches suffer from error propagation and are unable to dynamically select relevant information when utilizing previous dialogue states. Moreover, the relations between the updates of different slots provide vital clues for DST. However, the existing approaches rely only on predefined graphs to indirectly capture the relations. In this paper, we propose a Dialogue State Distillation Network (DSDN) to utilize relevant information of previous dialogue states and migrate the gap of utilization between training and testing. Thus, it can dynamically exploit previous dialogue states and avoid introducing error propagation simultaneously. Further, we propose an inter-slot contrastive learning loss to effectively capture the slot co-update relations from dialogue context. Experiments are conducted on the widely used MultiWOZ 2.0 and MultiWOZ 2.1 datasets. The experimental results show that our proposed model achieves the state-of-the-art performance for DST.

PDF Details DOI

YNIMG Journal 2023 Journal Article

State-dependent and region-specific alterations of cerebellar connectivity across stable human wakefulness and NREM sleep states

Jiayi Liu
Guangyuan Zou
Jing Xu
Shuqin Zhou
Lang Qin
Hongqiang Sun
Qihong Zou
Jia-Hong Gao

Details DOI

EAAI Journal 2023 Journal Article

Testing and performance analysis of an integrated electromagnetic and hydraulic retarder for heavy-duty vehicles

Kai Zhang
Huichao Shang
Jing Xu
Jigao Niu
Yonggao Yue

Details DOI

NeurIPS Conference 2023 Conference Paper

Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression

Jing Xu
Jiaye Teng
Yang Yuan
Andrew Yao

One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demonstrate that the generalization behavior of overparameterized model should be analyzed in a both data-relevant and algorithm-relevant manner. To make a formal characterization, We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory, instead of traditional last-iterate analysis. We validate our claim by studying the setting of solving overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting. Our theoretical results demonstrate that if we take early stopping iterates into consideration, generalization can hold with significantly weaker restrictions on the problem instance than the previous last-iterate analysis.

PDF Details

NeurIPS Conference 2022 Conference Paper

Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid

Jing Xu
Xu Luo
Xinglin Pan
Yanan Li
Wenjie Pei
Zenglin Xu

Few-shot learning (FSL) targets at generalization of vision models towards unseen tasks without sufficient annotations. Despite the emergence of a number of few-shot learning methods, the sample selection bias problem, i. e. , the sensitivity to the limited amount of support data, has not been well understood. In this paper, we find that this problem usually occurs when the positions of support samples are in the vicinity of task centroid—the mean of all class centroids in the task. This motivates us to propose an extremely simple feature transformation to alleviate this problem, dubbed Task Centroid Projection Removing (TCPR). TCPR is applied directly to all image features in a given task, aiming at removing the dimension of features along the direction of the task centroid. While the exact task centoid cannot be accurately obtained from limited data, we estimate it using base features that are each similar to one of the support features. Our method effectively prevents features from being too close to the task centroid. Extensive experiments over ten datasets from different domains show that TCPR can reliably improve classification accuracy across various feature extractors, training algorithms and datasets. The code has been made available at https: //github. com/KikimorMay/FSL-TCBR.

PDF Details

ICML Conference 2022 Conference Paper

Channel Importance Matters in Few-Shot Image Classification

Xu Luo 0003
Jing Xu
Zenglin Xu

Few-Shot Learning (FSL) requires vision models to quickly adapt to brand-new classification tasks with a shift in task distribution. Understanding the difficulties posed by this task distribution shift is central to FSL. In this paper, we show that a simple channel-wise feature transformation may be the key to unraveling this secret from a channel perspective. When facing novel few-shot tasks in the test-time datasets, this transformation can greatly improve the generalization ability of learned image representations, while being agnostic to the choice of datasets and training algorithms. Through an in-depth analysis of this transformation, we find that the difficulty of representation transfer in FSL stems from the severe channel bias problem of image representations: channels may have different importance in different tasks, while convolutional neural networks are likely to be insensitive, or respond incorrectly to such a shift. This points out a core problem of the generalization ability of modern vision systems which needs further attention in the future.

Details

YNIMG Journal 2022 Journal Article

Informed MEG/EEG source imaging reveals the locations of interictal spikes missed by SEEG

Su Shu
Shen Luo
Miao Cao
Ke Xu
Lang Qin
Li Zheng
Jing Xu
Xiongfei Wang

Details DOI

YNICL Journal 2022 Journal Article

Sleep discrepancy is associated with alterations in the salience network in patients with insomnia disorder: An EEG-fMRI study

Yuezhen Li
Guangyuan Zou
Yan Shao
Ping Yao
Jiayi Liu
Shuqin Zhou
Sifan Hu
Jing Xu

Details DOI

ICLR Conference 2022 Conference Paper

ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind

Yuanfei Wang
Fangwei Zhong
Jing Xu
Yizhou Wang 0001

Being able to predict the mental states of others is a key factor to effective social interaction. It is also crucial for distributed multi-agent systems, where agents are required to communicate and cooperate. In this paper, we introduce such an important social-cognitive skill, i.e. Theory of Mind (ToM), to build socially intelligent agents who are able to communicate and cooperate effectively to accomplish challenging tasks. With ToM, each agent is capable of inferring the mental states and intentions of others according to its (local) observation. Based on the inferred states, the agents decide "when'' and with "whom'' to share their intentions. With the information observed, inferred, and received, the agents decide their sub-goals and reach a consensus among the team. In the end, the low-level executors independently take primitive actions to accomplish the sub-goals. We demonstrate the idea in two typical target-oriented multi-agent tasks: cooperative navigation and multi-sensor target coverage. The experiments show that the proposed model not only outperforms the state-of-the-art methods on reward and communication efficiency, but also shows good generalization across different scales of the environment.

Details

AAAI Conference 2021 Conference Paper

MiniSeg: An Extremely Minimum Network for Efficient COVID-19 Segmentation

Yu Qiu
Yun Liu
Shijie Li
Jing Xu

The rapid spread of the new pandemic, i. e. , COVID-19, has severely threatened global health. Deep-learning-based computer-aided screening, e. g. , COVID-19 infected CT area segmentation, has attracted much attention. However, the publicly available COVID-19 training data are limited, easily causing overfitting for traditional deep learning methods that are usually data-hungry with millions of parameters. On the other hand, fast training/testing and low computational cost are also necessary for quick deployment and development of COVID-19 screening systems, but traditional deep learning methods are usually computationally intensive. To address the above problems, we propose MiniSeg, a lightweight deep learning model for efficient COVID-19 segmentation. Compared with traditional segmentation methods, MiniSeg has several significant strengths: i) it only has 83K parameters and is thus not easy to overfit; ii) it has high computational efficiency and is thus convenient for practical deployment; iii) it can be fast retrained by other users using their private COVID-19 data for further improving performance. In addition, we build a comprehensive COVID-19 segmentation benchmark for comparing MiniSeg to traditional methods.

PDF Details

YNIMG Journal 2020 Journal Article

EEG microstates are correlated with brain functional networks during slow-wave sleep

Jing Xu
Yu Pan
Shuqin Zhou
Guangyuan Zou
Jiayi Liu
Zihui Su
Qihong Zou
Jia-Hong Gao

Details DOI

NeurIPS Conference 2020 Conference Paper

Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks

Jing Xu
Fangwei Zhong
Yizhou Wang

Maximum target coverage by adjusting the orientation of distributed sensors is an important problem in directional sensor networks (DSNs). This problem is challenging as the targets usually move randomly but the coverage range of sensors is limited in angle and distance. Thus, it is required to coordinate sensors to get ideal target coverage with low power consumption, e. g. no missing targets or reducing redundant coverage. To realize this, we propose a Hierarchical Target-oriented Multi-Agent Coordination (HiT-MAC), which decomposes the target coverage problem into two-level tasks: targets assignment by a coordinator and tracking assigned targets by executors. Specifically, the coordinator periodically monitors the environment globally and allocates targets to each executor. In turn, the executor only needs to track its assigned targets. To effectively learn the HiT-MAC by reinforcement learning, we further introduce a bunch of practical methods, including a self-attention module, marginal contribution approximation for the coordinator, goal-conditional observation filter for the executor, etc. Empirical results demonstrate the advantage of HiT-MAC in coverage rate, learning efficiency, and scalability, comparing to baselines. We also conduct an ablative analysis on the effectiveness of the introduced components in the framework.

PDF Details

AAAI Conference 2020 Conference Paper

Pose-Assisted Multi-Camera Collaboration for Active Object Tracking

Jing Li
Jing Xu
Fangwei Zhong
Xiangyu Kong
Yu Qiao
Yizhou Wang

Active Object Tracking (AOT) is crucial to many visionbased applications, e. g. , mobile robot, intelligent surveillance. However, there are a number of challenges when deploying active tracking in complex scenarios, e. g. , target is frequently occluded by obstacles. In this paper, we extend the single-camera AOT to a multi-camera setting, where cameras tracking a target in a collaborative fashion. To achieve effective collaboration among cameras, we propose a novel Pose- Assisted Multi-Camera Collaboration System, which enables a camera to cooperate with the others by sharing camera poses for active object tracking. In the system, each camera is equipped with two controllers and a switcher: The vision-based controller tracks targets based on observed images. The pose-based controller moves the camera in accordance to the poses of the other cameras. At each step, the switcher decides which action to take from the two controllers according to the visibility of the target. The experimental results demonstrate that our system outperforms all the baselines and is capable of generalizing to unseen environments. The code and demo videos are available on our website https: //sites. google. com/view/pose-assistedcollaboration.

PDF Details

AAAI Conference 2019 Conference Paper

Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition

Yu Pan
Jing Xu
Maolin Wang
Jinmian Ye
Fei Wang
Kun Bai
Zenglin Xu

Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs very difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor-train LSTM and other state-of-the-art competitors.

PDF Details

YNIMG Journal 2018 Journal Article

Dissociated resting-state functional networks between the dream recall frequency and REM sleep percentage

Qihong Zou
Shuqin Zhou
Jing Xu
Zihui Su
Yuezhen Li
Yundong Ma
Hongqiang Sun
Changwei W. Wu

Details DOI

JMLR Journal 2010 Journal Article

Continuous Time Bayesian Network Reasoning and Learning Engine

Christian R. Shelton
Yu Fan
William Lam
Joon Lee
Jing Xu

We present a continuous time Bayesian network reasoning and learning engine (CTBN-RLE). A continuous time Bayesian network (CTBN) provides a compact (factored) description of a continuous-time Markov process. This software provides libraries and programs for most of the algorithms developed for CTBNs. For learning, CTBN-RLE implements structure and parameter learning for both complete and partial data. For inference, it implements exact inference and Gibbs and importance sampling approximate inference for any type of evidence pattern. Additionally, the library supplies visualization methods for graphically displaying CTBNs or trajectories of evidence. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2010. ( edit, beta )

PDF Details

JMLR Journal 2010 Journal Article

Importance Sampling for Continuous Time Bayesian Networks

Yu Fan
Jing Xu
Christian R. Shelton

A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a finite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we first present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle filtering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people's life history events. We show the accuracy as well as the time efficiency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )

PDF Details

NeurIPS Conference 2008 Conference Paper

How memory biases affect information transmission: A rational analysis of serial reproduction

Jing Xu
Thomas Griffiths

Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in âserial reproductionâ of information, in which one personâs reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases.

PDF Details

YNIMG Journal 2006 Journal Article

Reproducibility of activation in Broca's area during covert generation of single words at high field: A single trial FMRI study at 4 T

Andrew R. Mayer
Jing Xu
Juliana Paré-Blagoev
Stefan Posse

Details DOI