Author name cluster

Xia Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

1 author row

AAAI Conference 2025 Conference Paper

KAES: Multi-aspect Shared Knowledge Finding and Aligning for Cross-prompt Automated Scoring of Essay Traits

Xia Li
Wenjing Pan

Cross-prompt automated essay scoring (AES) aims to train models using essays from different source prompts and test them on new target prompt essays. A core challenge of the task is to learn as much shared knowledge as possible between essays from different prompts in order to better represent new prompt essays. Previous studies primarily focus on learning this knowledge on a general, coarse-grained level, ignoring that the shared knowledge among prompts is highly detailed and contains a more comprehensive range of information that is not fully investigated. In this paper, we propose a novel multi-aspect knowledge finding and aligning optimization strategy to better acquire this detailed various shared knowledge. We also introduce LLM to extract explicit, interpretable knowledge from implicit, multi-aspect shared knowledge and use this knowledge to improve the representation and evaluation performance of new prompt essays. We conduct extensive experiments on public datasets. The results show that our approach outperforms current state-of-the-art models and is effective on cross-prompt AES.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

One-shot Federated Learning Methods: A Practical Guide

Xiang Liu
Zhenheng Tang
Xia Li
Yijun Song
Sijie Ji
Zemin Liu
Bo Han
Linshan Jiang

One-shot Federated Learning (OFL) is a distributed machine learning paradigm that constrains client-server communication to a single round, addressing privacy and communication overhead issues associated with multiple rounds of data exchange in traditional Federated Learning (FL). OFL demonstrates the practical potential for integration with future approaches that require collaborative training models, such as large language models (LLMs). However, current OFL methods face two major challenges: data heterogeneity and model heterogeneity, which result in subpar performance compared to conventional FL methods. Worse still, despite numerous studies addressing these limitations, a comprehensive summary is still lacking. To address these gaps, this paper presents a systematic analysis of the challenges faced by OFL and thoroughly reviews the current methods. We also offer an innovative categorization method and analyze the trade-offs of various techniques. Additionally, we discuss the most promising future directions and the technologies that should be integrated into the OFL field. This work aims to provide guidance and insights for future research.

PDF Details DOI

AAAI Conference 2024 Conference Paper

A Unified Environmental Network for Pedestrian Trajectory Prediction

Yuchao Su
Yuanman Li
Wei Wang
Jiantao Zhou
Xia Li

Accurately predicting pedestrian movements in complex environments is challenging due to social interactions, scene constraints, and pedestrians' multimodal behaviors. Sequential models like long short-term memory fail to effectively integrate scene features to make predicted trajectories comply with scene constraints due to disparate feature modalities of scene and trajectory. Though existing convolution neural network (CNN) models can extract scene features, they are ineffective in mapping these features into scene constraints for pedestrians and struggle to model pedestrian interactions due to the loss of target pedestrian information. To address these issues, we propose a unified environmental network based on CNN for pedestrian trajectory prediction. We introduce a polar-based method to reflect the distance and direction relationship between any position in the environment and the target pedestrian. This enables us to simultaneously model scene constraints and pedestrian social interactions in the form of feature maps. Additionally, we capture essential local features in the feature map, characterizing potential multimodal movements of pedestrians at each time step to prevent redundant predicted trajectories. We verify the performance of our proposed model on four trajectory prediction datasets, encompassing both short-term and long-term predictions. The experimental results demonstrate the superiority of our approach over existing methods.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation

Xiang Liu
Liangxi Liu
Feiyang Ye
Yunheng Shen
Xia Li
Linshan Jiang
Jialin Li

Efficiently aggregating trained neural networks from local clients into a global model on a server is a widely researched topic in federated learning. Recently, motivated by diminishing privacy concerns, mitigating potential attacks, and reducing communication overhead, one-shot federated learning (i. e. , limiting client-server communication into a single round) has gained popularity among researchers. However, the one-shot aggregation performances are sensitively affected by the non-identical training data distribution, which exhibits high statistical heterogeneity in some real-world scenarios. To address this issue, we propose a novel one-shot aggregation method with layer-wise posterior aggregation, named FedLPA. FedLPA aggregates local models to obtain a more accurate global model without requiring extra auxiliary datasets or exposing any private label information, e. g. , label distributions. To effectively capture the statistics maintained in the biased local datasets in the practical non-IID scenario, we efficiently infer the posteriors of each layer in each local model using layer-wise Laplace approximation and aggregate them to train the global parameters. Extensive experimental results demonstrate that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.

PDF Details DOI

AAAI Conference 2024 Conference Paper

SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation

Changsheng Lv
Mengshi Qi
Xia Li
Zhengyuan Yang
Huadong Ma

In this paper, we propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation. The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure. Existing methods based on graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and can only propagate information from limited neighboring nodes. In contrast, SGFormer uses Transformer layers as the base building block to allow global information passing, with two types of newly-designed layers tailored for the 3D scene graph generation task. Specifically, we introduce the graph embedding layer to best utilize the global information in graph edges while maintaining comparable computation costs. Furthermore, we propose the semantic injection layer to leverage linguistic knowledge from large-scale language model (i.e., ChatGPT), to enhance objects' visual features. We benchmark our SGFormer on the established 3DSSG dataset and achieve a 40.94% absolute improvement in relationship prediction's R@50 and an 88.36% boost on the subset with complex scenes over the state-of-the-art. Our analyses further show SGFormer's superiority in the long-tail and zero-shot scenarios. Our source code is available at https://github.com/Andy20178/SGFormer.

PDF Details DOI

AAAI Conference 2024 Conference Paper

You Only Read Once: Constituency-Oriented Relational Graph Convolutional Network for Multi-Aspect Multi-Sentiment Classification

Yongqiang Zheng
Xia Li

Most of the existing aspect-based sentiment analysis (ABSA) models only predict the sentiment polarity of a single aspect at a time, focusing primarily on enhancing the representation of this single aspect based on the other contexts or aspects. This one-to-one paradigm ignores the fact that multi-aspect, multi-sentiment sentences contain not only distinct specific descriptions for distinct specific aspects, but also shared global context information for multiple aspects. To fully consider these issues, we propose a one-to-many ABSA framework, called You Only Read Once (YORO), that can simultaneously model representations of all aspects based on their specific descriptions and better fuse their relationships using globally shared contextual information in the sentence. Predicting the sentiment polarity of multiple aspects simultaneously is beneficial to improving the efficacy of calculation and prediction. Extensive experiments are conducted on three public datasets (MAMS, Rest14, and Lap14). Experimental results demonstrate the effectiveness of YORO in handling multi-aspect, multi-sentiment scenarios and highlight the promise of one-to-many ABSA in balancing efficiency and accuracy.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Explore In-Context Learning for 3D Point Cloud Understanding

Zhongbin Fang
Xiangtai Li
Xia Li
Joachim M Buhmann
Chen Change Loy
Mengyuan Liu

With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks. Furthermore, with a more effective prompt selection strategy, our framework surpasses the results of individually trained models.

PDF Details

EAAI Journal 2023 Journal Article

MIVAE: Multiple Imputation based on Variational Auto-Encoder

Qian Ma
Xia Li
Mei Bai
Xite Wang
Bo Ning
Guanyu Li

Nowadays, the issue of MV imputation has become one of the research hotspots in the field of data quality, since the missing values (MVs) are prevalent in real-world datasets and bring challenges to advanced data analytics algorithms. To impute the MVs, most existing approaches directly derive one estimation for each MV, which is categorized as the single imputation (SI). However, the SI ignores the uncertainty of the MVs, and thereby usually derive unsatisfactory imputation results compared to the Multiple imputation (MI). To extract the uncertainty of the MVs, the MI algorithms derive multiple candidate estimations for each MV. Nevertheless, existing MI approaches are few due to the complicated data-handling process. Accordingly, in this paper, by exploring the Variational Auto-Encoder (VAE) model, we propose a new MI approach, namely MIVAE (Multiple Imputation based on Variational Auto-Encoder) to impute MVs for the tabular data. In MIVAE, we first add a corrupted input layer (where the synthetic MVs are introduced) adjacent to the original input layer to make the model capable of MV issue. Then, we obtain multiple rather than single candidate estimations for each data sample from the posterior distribution of the latent variables learned by our designed model. In such way, the multiple imputation is effectively implemented where the uncertainty of the MVs are extracted perfectly. Next, to obtain satisfactory imputation results, we add a data analysis layer at the end of the network to integrate multiple candidate estimations intelligently. Finally, the experimental results over four real-world datasets demonstrate that MIVAE achieves significantly higher imputation accuracy compared to existing solutions, and MIVAE are capable of handling both numerical and categorized tabular data. For example, the imputation accuracy based on MIVAE improves up to about 40% and 30% compared with PMM and MIWAE (which are the state-of-the-art MI approach) over the CropMapping dataset, respectively. Moreover, we train a MIVAE model over three datasets containing MVs, respectively. By leveraging the trained MIVAE, the classification performance over the imputed data is similar to that over the complete data.

Details DOI

NeurIPS Conference 2021 Conference Paper

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Lei Ke
Xia Li
Martin Danelljan
Yu-Wing Tai
Chi-Keung Tang
Fisher Yu

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code and video resources are available at http: //vis. xyz/pub/pcan.

PDF Details

AAAI Conference 2021 Conference Paper

Temporal Pyramid Network for Pedestrian Trajectory Prediction with Multi-Supervision

Rongqin Liang
Yuanman Li
Xia Li
Yi Tang
Jiantao Zhou
Wenbin Zou

Predicting human motion behavior in a crowd is important for many applications, ranging from the natural navigation of autonomous vehicles to intelligent security systems of video surveillance. All the previous works model and predict the trajectory with a single resolution, which is relatively ineffective and difficult to simultaneously exploit the long-range information (e. g. , the destination of the trajectory), and the short-range information (e. g. , the walking direction and speed at a certain time) of the motion behavior. In this paper, we propose a temporal pyramid network for pedestrian trajectory prediction through a squeeze modulation and a dilation modulation. Our hierarchical framework builds a feature pyramid with increasingly richer temporal information from top to bottom, which can better capture the motion behavior at various tempos. Furthermore, we propose a coarse-to-fine fusion strategy with multi-supervision. By progressively merging the top coarse features of global context to the bottom fine features of rich local context, our method can fully exploit both the long-range and short-range information of the trajectory. Experimental results on two benchmarks demonstrate the superiority of our method. Our code and models will be available upon acceptance.

PDF Details

AAAI Conference 2020 Conference Paper

Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families

Yibo Yang
Jianlong Wu
Hongyang Li
Xia Li
Tiancheng Shen
Zhouchen Lin

The correspondence between residual networks and dynamical systems motivates researchers to unravel the physics of ResNets with well-developed tools in numeral methods of ODE systems. The Runge-Kutta-Fehlberg method is an adaptive time stepping that renders a good trade-off between the stability and efﬁciency. Can we also have an adaptive time stepping for ResNets to ensure both stability and performance? In this study, we analyze the effects of time stepping on the Euler method and ResNets. We establish a stability condition for ResNets with step sizes and weight parameters, and point out the effects of step sizes on the stability and performance. Inspired by our analyses, we develop an adaptive time stepping controller that is dependent on the parameters of the current step, and aware of previous steps. The controller is jointly optimized with the network training so that variable step sizes and evolution time can be adaptively adjusted. We conduct experiments on ImageNet and CIFAR to demonstrate the effectiveness. It is shown that our proposed method is able to improve both stability and accuracy without introducing additional overhead in inference phase.

PDF Details

AAAI Conference 2020 Conference Paper

SOGNet: Scene Overlap Graph Network for Panoptic Segmentation

Yibo Yang
Hongyang Li
Xia Li
Qijie Zhao
Jianlong Wu
Zhouchen Lin

The panoptic segmentation task requires a uniﬁed result from semantic and instance segmentation outputs that may contain overlaps. However, current studies widely ignore modeling overlaps. In this study, we aim to model overlap relations among instances and resolve them for panoptic segmentation. Inspired by scene graph representation, we formulate the overlapping problem as a simpliﬁed case, named scene overlap graph. We leverage each object’s category, geometry and appearance features to perform relational embedding, and output a relation matrix that encodes overlap relations. In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances. The mask logits after removing overlaps are fed into per-pixel instance id classiﬁcation, which leverages the panoptic supervision to assist in the modeling of overlap relations. Besides, we generate an approximate ground truth of overlap relations as the weak supervision, to quantify the accuracy of overlap relations predicted by our method. Experiments on COCO and Cityscapes demonstrate that our method is able to accurately predict overlap relations, and outperform the state-of-the-art performance for panoptic segmentation. Our method also won the Innovation Award in COCO 2019 challenge.

PDF Details

YNIMG Journal 2018 Journal Article

Tests of cortical parcellation based on white matter connectivity using diffusion tensor imaging

Yurui Gao
Kurt G. Schilling
Iwona Stepniewska
Andrew J. Plassard
Ann S. Choe
Xia Li
Bennett A. Landman
Adam W. Anderson

The cerebral cortex is conventionally divided into a number of domains based on cytoarchitectural features. Diffusion tensor imaging (DTI) enables noninvasive parcellation of the cortex based on white matter connectivity patterns. However, the correspondence between DTI-connectivity-based and cytoarchitectural parcellation has not been systematically established. In this study, we compared histological parcellation of New World monkey neocortex to DTI- connectivity-based classification and clustering in the same brains. First, we used supervised classification to parcellate parieto-frontal cortex based on DTI tractograms and the cytoarchitectural prior (obtained using Nissl staining). We performed both within and across sample classification, showing reasonable classification performance in both conditions. Second, we used unsupervised clustering to parcellate the cortex and compared the clusters to the cytoarchitectonic standard. We then explored the similarities and differences with several post-hoc analyses, highlighting underlying principles that drive the DTI-connectivity-based parcellation. The differences in parcellation between DTI-connectivity and Nissl histology probably represent both DTI's bias toward easily-tracked bundles and true differences between cytoarchitectural and connectivity defined domains. DTI tractograms appear to cluster more according to functional networks, rather than mapping directly onto cytoarchitectonic domains. Our results show that caution should be used when DTI-tractography classification, based on data from another brain, is used as a surrogate for cytoarchitectural parcellation.

Details DOI