Arrow Research search

Author name cluster

Hao Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

33 papers
2 author rows

Possible papers

33

AAAI Conference 2026 Conference Paper

AncientBench: Towards Comprehensive Evaluation on Excavated and Transmitted Chinese Corpora

  • Zhihan Zhou
  • Daqian Shi
  • Rui Song
  • Lida Shi
  • Xiaolei Diao
  • Hao Xu

Comprehension of ancient texts plays an important role in archaeology and understanding of Chinese history and civilization. The rapid development of large language models needs benchmarks that can evaluate their comprehension of ancient characters. Existing Chinese benchmarks are mostly targeted at modern Chinese and transmitted documents in ancient Chinese, but the part of excavated documents in ancient Chinese is not covered. To meet this need, we propose the AncientBench, which aims to evaluate the comprehension of ancient characters, especially in the scenario of excavated documents. The AncientBench is divided into four dimensions, which correspond to the four competencies of ancient character comprehension: glyph comprehension, pronunciation comprehension, meaning comprehension, and contextual comprehension. The benchmark also contains ten tasks, including radical, phonetic radical, homophone, cloze, translation, and more, providing a comprehensive framework for evaluation. We convened archaeological researchers to conduct experimental evaluations, proposed an ancient model as baseline, and conducted extensive experiments on the currently best-performing large language models. The experimental results reveal the great potential of large language models in ancient textual scenarios as well as the gap with humans. Our research aims to promote the development and application of large language models in the field of archaeology and ancient Chinese language.

AAAI Conference 2026 Conference Paper

InteChar: A Unified Oracle Bone Character List for Ancient Chinese Language Modeling

  • Xiaolei Diao
  • Zhihan Zhou
  • Lida Shi
  • Ting Wang
  • Ruihua Qi
  • Daqian Shi
  • Hao Xu

Constructing historical language models (LMs) plays a crucial role in aiding archaeological provenance studies and understanding ancient cultures. However, existing resources present major challenges for training effective LMs on historical texts. First, the scarcity of historical language samples renders unsupervised learning approaches based on large text corpora highly inefficient, hindering effective pre-training. Moreover, due to the considerable temporal gap and complex evolution of ancient scripts, the absence of comprehensive character encoding schemes limits the digitization and computational processing of ancient texts, particularly in early Chinese writing. To address these challenges, we introduce InteChar, a unified and extensible character list that integrates unencoded oracle bone characters with traditional and modern Chinese. InteChar enables consistent digitization and representation of historical texts, providing a foundation for robust modeling of ancient scripts. To evaluate the effectiveness of InteChar, we construct the Oracle Corpus Set (OracleCS), an ancient Chinese corpus that combines expert-annotated samples with LLM-assisted data augmentation, centered on Chinese oracle bone inscriptions. Extensive experiments show that models trained with InteChar on OracleCS achieve substantial improvements across various historical language understanding tasks, confirming the effectiveness of our approach and establishing a solid foundation for future research in ancient Chinese NLP.

TMLR Journal 2026 Journal Article

Prompt-based Adaptation in Large-scale Vision Models: A Survey

  • Xi Xiao
  • Yunbei Zhang
  • Lin Zhao
  • Yiyang Liu
  • Xiaoying Liao
  • Zheda Mai
  • Xingjian Li
  • Xiao Wang

In computer vision, Visual Prompting (VP) and Visual Prompt Tuning (VPT) have recently emerged as lightweight and effective alternatives to full fine-tuning for adapting large-scale vision models within the ``pretrain-then-finetune'' paradigm. However, despite rapid progress, their conceptual boundaries remain blurred, as VP and VPT are frequently used interchangeably in current research, reflecting a lack of systematic distinction between these techniques and their respective applications. In this survey, we revisit the designs of VP and VPT from first principles, and conceptualize them within a unified framework termed Prompt-based Adaptation (PA). Within this framework, we distinguish methods based on their injection granularity: VP operates at the pixel level, while VPT injects prompts at the token level. We further categorize these methods by their generation mechanism into fixed, learnable, and generated prompts. Beyond the core methodologies, we examine PA’s integrations across diverse domains, including medical imaging, 3D point clouds, and vision-language tasks, as well as its role in test-time adaptation and trustworthy AI. We also summarize current benchmarks and identify key challenges and future directions. To the best of our knowledge, we are the first comprehensive survey dedicated to PA's methodologies and applications in light of their distinct characteristics. Our survey aims to provide a clear roadmap for researchers and practitioners in all area to understand and explore the evolving landscape of PA-related research.

AAAI Conference 2025 Conference Paper

How Do Position Encodings Affect Length Generalization? Case Studies On In-Context Function Learning

  • Di-Nan Lin
  • Yao Jui-Feng
  • Kun-Da Wu
  • Hao Xu
  • Chen-Hsi Huang
  • Hung-Yu Kao

The capability of In-Context Learning (ICL) is crucial for large language models to generalize across a wide range of tasks. By utilizing prompts, these models can accurately predict outcomes for previously unseen tasks without necessitating retraining. However, this generalization ability does not extend to the length of the inputs; the effectiveness of ICL likely diminishes with excessively long inputs, resulting in errors in the generated text. To investigate this issue, we propose a study using a dataset of In-Context functions to understand the operational mechanisms of Transformer models in ICL and length generalization. We generated data using regression and Boolean functions and employed meta-learning techniques to endow the model with ICL capabilities. Our experimental results indicate that position encodings can significantly mitigate length generalization issues, with the most effective encoding extending the maximum input length to over eight times that of the original training length. However, further analysis revealed that while position encoding enhances length generalization, it compromises the model's inherent capabilities, such as its ability to generalize across different data types. Overall, our research illustrates that position encodings have a pronounced positive effect on length generalization, though it necessitates a careful trade-off with data generalization performance.

AAAI Conference 2025 Conference Paper

HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions

  • Keyu Du
  • Hao Xu
  • Haipeng Li
  • Hong Qu
  • Chi-Wing Fu
  • Shuaicheng Liu

Scene-level point cloud registration is very challenging when considering dynamic foregrounds. Existing indoor datasets mostly assume rigid motions, so the trained models cannot robustly handle scenes with non-rigid motions. On the other hand, non-rigid datasets are mainly object-level, so the trained models cannot generalize well to complex scenes. This paper presents HybridReg, a new approach to 3D point cloud registration, learning uncertainty mask to account for hybrid motions: rigid for backgrounds and non-rigid/rigid for instance-level foregrounds. First, we build a scene-level 3D registration dataset, namely HybridMatch, designed specifically with strategies to arrange diverse deforming foregrounds in a controllable manner. Second, we account for different motion types and formulate a mask-learning module to alleviate the interference of deforming outliers. Third, we exploit a simple yet effective negative log-likelihood loss to adopt uncertainty to guide the feature extraction and correlation computation. To our best knowledge, HybridReg is the first work that exploits hybrid motions for robust point cloud registration. Extensive experiments show HybridReg's strengths, leading it to achieve state-of-the-art performance on both widely-used indoor and outdoor datasets.

NeurIPS Conference 2025 Conference Paper

Learning to Factorize Spatio-Temporal Foundation Models

  • Siru Zhong
  • Junjie Qiu
  • Yangyu Wu
  • Xingchen Zou
  • Zhongwen Rao
  • Bin Yang
  • Chenjuan Guo
  • Hao Xu

Spatio-Temporal Foundation Models (STFMs) promise zero/few-shot generalization across various datasets, yet joint spatio-temporal pretraining is computationally prohibitive and struggles with domain-specific spatial correlations. To this end, we introduce FactoST, a factorized STFM that decouples universal temporal pretraining from spatio-temporal adaptation. The first stage pretrains a space-agnostic backbone with multi-frequency reconstruction and domain-aware prompting, capturing cross-domain temporal regularities at low computational cost. The second stage freezes or further fine-tunes the backbone and attaches an adapter that fuses spatial metadata, sparsifies interactions, and aligns domains with continual memory replay. Extensive forecasting experiments reveal that, in few-shot setting, FactoST reduces MAE by up to 46. 4% versus UniST, uses 46. 2% fewer parameters, and achieves 68% faster inference than OpenCity, while remaining competitive with expert models. We believe this factorized view offers a practical and scalable path toward truly universal STFMs. The code will be released upon notification.

IJCAI Conference 2025 Conference Paper

Mat-Instructions: A Large-Scale Inorganic Material Instruction Dataset for Large Language Models

  • Ke Liu
  • Shangde Gao
  • Yichao Fu
  • Xiaoliang Wu
  • Shuo Tong
  • Ajitha Rajan
  • Hao Xu

Recent advancements in large language models (LLMs) have revolutionized research discovery across various scientific disciplines, including materials science. The discovery of novel materials, particularly crystal materials, is essential for achieving sustainable development goals (SDGs), as they drive breakthroughs in climate change mitigation, clean and affordable energy, and the promotion of industrial innovation. However, unlocking the full potential of LLMs in materials research remains challenging due to the lack of high-quality, diverse, and instruction-based datasets. Such datasets are crucial for guiding these models in understanding and predicting the structure, property, and function of materials across various tasks. To address this limitation, we introduce Mat-Instruction, a large-scale inorganic material instruction dataset, specifically designed to unlock the potential of LLMs in materials science. Extensive experiments on fine-tuning LLaMA with our Mat-Instruction dataset demonstrate its effectiveness in advancing progress for materials science. The code and dataset are available at https: //github. com/zjuKeLiu/Mat-Instructions

NeurIPS Conference 2025 Conference Paper

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

  • Bingquan Dai
  • Luo Li
  • Qihong Tang
  • Jie Wang
  • Xinyu Lian
  • Hao Xu
  • Minghan Qin
  • Xudong XU

Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshLLM, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshLLM as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding.

AAAI Conference 2025 Conference Paper

UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection

  • HaoMiao Liu
  • Hao Xu
  • Chuhuai Yue
  • Bo Ma

Unknown Object Detection (UOD) aims to identify objects of unseen categories, differing from the traditional detection paradigm limited by the closed-world assumption. A key component of UOD is learning a generalized representation, i.e. objectness for both known and unknown categories to distinguish and localize objects from the background in a class-agnostic manner. However, previous methods obtain supervision signals for learning objectness in isolation from either localization or classification information, leading to poor performance for UOD. To address this issue, we propose a transformer-based UOD framework, UN-DETR. Based on this, we craft Instance Presence Score (IPS) to represent the probability of an object's presence. For the purpose of information complementarity, IPS employs a strategy of joint supervised learning, integrating attributes representing general objectness from the positional and the categorical latent space as supervision signals. To enhance IPS learning, we introduce a one-to-many assignment strategy to incorporate more supervision. Then, we propose Unbiased Query Selection to provide premium initial query vectors for the decoder. Additionally, we propose an IPS-guided post process strategy to filter redundant boxes and correct classification predictions for known and unknown objects. Finally, we pretrain the entire UN-DETR in an unsupervised manner, in order to obtain objectness prior. Our UN-DETR is comprehensively evaluated on multiple UOD and known detection benchmarks, demonstrating its effectiveness and achieving state-of-the-art performance.

AAAI Conference 2024 Conference Paper

Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation

  • Yu Wang
  • Zexue He
  • Zhankui He
  • Hao Xu
  • Julian McAuley

Understanding and accurately explaining compatibility relationships between fashion items is a challenging problem in the burgeoning domain of AI-driven outfit recommendations. Present models, while making strides in this area, still occasionally fall short, offering explanations that can be elementary and repetitive. This work aims to address these shortcomings by introducing the Pair Fashion Explanation (PFE) dataset, a unique resource that has been curated to illuminate these compatibility relationships. Furthermore, we propose an innovative two stage pipeline model that leverages this dataset. This fine-tuning allows the model to generate explanations that convey the compatibility relationships between items. Our experiments showcase the model's potential in crafting descriptions that are knowledgeable, aligned with ground-truth matching correlations, and that produce understandable and informative descriptions, as assessed by both automatic metrics and human evaluation. Our code and data are released at https://github.com/wangyu-ustc/PairFashionExplanation.

NeurIPS Conference 2024 Conference Paper

HHD-GP: Incorporating Helmholtz-Hodge Decomposition into Gaussian Processes for Learning Dynamical Systems

  • Hao Xu
  • Jia Pan

Machine learning models provide alternatives for efficiently recognizing complex patterns from data, but the main concern in applying them to modeling physical systems stems from their physics-agnostic design, leading to learning methods that lack interpretability, robustness, and data efficiency. This paper mitigates this concern by incorporating the Helmholtz-Hodge decomposition into a Gaussian process model, leading to a versatile framework that simultaneously learns the curl-free and divergence-free components of a dynamical system. Learning a predictive model in this form facilitates the exploitation of symmetry priors. In addition to improving predictive power, these priors make the model indentifiable, thus the identified features can be linked to comprehensible scientific properties of the system. We show that compared to baseline models, our model achieves better predictive performance on several benchmark dynamical systems while allowing physically meaningful decomposition of the systems from noisy and sparse data.

AAAI Conference 2024 Conference Paper

SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation

  • Yinqiao Wang
  • Hao Xu
  • Pheng Ann Heng
  • Chi-Wing Fu

Estimating 3D hand mesh from RGB images is a longstanding track, in which occlusion is one of the most challenging problems. Existing attempts towards this task often fail when the occlusion dominates the image space. In this paper, we propose SiMA-Hand, aiming to boost the mesh reconstruction performance by Single-to-Multi-view Adaptation. First, we design a multi-view hand reconstructor to fuse information across multiple views by holistically adopting feature fusion at image, joint, and vertex levels. Then, we introduce a single-view hand reconstructor equipped with SiMA. Though taking only one view as input at inference, the shape and orientation features in the single-view reconstructor can be enriched by learning non-occluded knowledge from the extra views at training, enhancing the reconstruction precision on the occluded regions. We conduct experiments on the Dex-YCB and HanCo benchmarks with challenging object- and self-caused occlusion cases, manifesting that SiMA-Hand consistently achieves superior performance over the state of the arts. Code will be released on https://github.com/JoyboyWang/SiMA-Hand Pytorch.

AAAI Conference 2024 Conference Paper

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

  • Youhong Wang
  • Yunji Liang
  • Hao Xu
  • Shaohui Jiao
  • Hongkai Yu

Recently, self-supervised monocular depth estimation has gained popularity with numerous applications in autonomous driving and robotics. However, existing solutions primarily seek to estimate depth from immediate visual features, and struggle to recover fine-grained scene details. In this paper, we introduce SQLdepth, a novel approach that can effectively learn fine-grained scene structure priors from ego-motion. In SQLdepth, we propose a novel Self Query Layer (SQL) to build a self-cost volume and infer depth from it, rather than inferring depth from feature maps. We show that, the self-cost volume is an effective inductive bias for geometry learning, which implicitly models the single-frame scene geometry, with each slice of it indicating a relative distance map between points and objects in a latent space. Experimental results on KITTI and Cityscapes show that our method attains remarkable state-of-the-art performance, and showcases computational efficiency, reduced training complexity, and the ability to recover fine-grained scene details. Moreover, the self-matching-oriented relative distance querying in SQL improves the robustness and zero-shot generalization capability of SQLdepth. Code is available at https://github.com/hisfog/SfMNeXt-Impl.

AAAI Conference 2024 Conference Paper

TACIT: A Target-Agnostic Feature Disentanglement Framework for Cross-Domain Text Classification

  • Rui Song
  • Fausto Giunchiglia
  • Yingji Li
  • Mingjie Tian
  • Hao Xu

Cross-domain text classification aims to transfer models from label-rich source domains to label-poor target domains, giving it a wide range of practical applications. Many approaches promote cross-domain generalization by capturing domaininvariant features. However, these methods rely on unlabeled samples provided by the target domains, which renders the model ineffective when the target domain is agnostic. Furthermore, the models are easily disturbed by shortcut learning in the source domain, which also hinders the improvement of domain generalization ability. To solve the aforementioned issues, this paper proposes TACIT, a target domain agnostic feature disentanglement framework which adaptively decouples robust and unrobust features by Variational Auto-Encoders. Additionally, to encourage the separation of unrobust features from robust features, we design a feature distillation task that compels unrobust features to approximate the output of the teacher. The teacher model is trained with a few easy samples that are easy to carry potential unknown shortcuts. Experimental results verify that our framework achieves comparable results to state-of-the-art baselines while utilizing only source domain data.

AAAI Conference 2023 Conference Paper

Differentiable Meta Multigraph Search with Partial Message Propagation on Heterogeneous Information Networks

  • Chao Li
  • Hao Xu
  • Kun He

Heterogeneous information networks (HINs) are widely employed for describing real-world data with intricate entities and relationships. To automatically utilize their semantic information, graph neural architecture search has recently been developed for various tasks of HINs. Existing works, on the other hand, show weaknesses in instability and inflexibility. To address these issues, we propose a novel method called Partial Message Meta Multigraph search (PMMM) to automatically optimize the neural architecture design on HINs. Specifically, to learn how graph neural networks (GNNs) propagate messages along various types of edges, PMMM adopts an efficient differentiable framework to search for a meaningful meta multigraph, which can capture more flexible and complex semantic relations than a meta graph. The differentiable search typically suffers from performance instability, so we further propose a stable algorithm called partial message search to ensure that the searched meta multigraph consistently surpasses the manually designed meta-structures, i.e., meta-paths. Extensive experiments on six benchmark datasets over two representative tasks, including node classification and recommendation, demonstrate the effectiveness of the proposed method. Our approach outperforms the state-of-the-art heterogeneous GNNs, finds out meaningful meta multigraphs, and is significantly more stable. Our code is available at https://github.com/JHL-HUST/PMMM.

ICRA Conference 2023 Conference Paper

Planning Assembly Sequence with Graph Transformer

  • Lin Ma 0002
  • Jiangtao Gong
  • Hao Xu
  • Hao Chen 0062
  • Hao Zhao 0002
  • Wenbing Huang 0001
  • Guyue Zhou

Assembly Sequence Planning (ASP) is the essential process for modern manufacturing, proven to be NP-complete thus its effective and efficient solution has been a challenge for researchers in the field. In this paper, we present a graph-transformer based framework for the ASP problem which is trained and demonstrated on a self-collected ASP database. The ASP database contains a self-collected set of LEGO models. The LEGO model is abstracted to a heterogeneous graph structure after a thorough analysis of the original structure and feature extraction. The ground truth assembly sequence is first generated by brute-force search and then adjusted manually to be in line with human rational habits. Based on this self-collected ASP dataset, we propose a heterogeneous graph-transformer framework to learn the latent rules for assembly planning. We evaluated the proposed framework in a series of experiments. The results show that the similarity of the predicted and ground truth sequences can reach 0. 44, a medium correlation measured by Kendall's τ. Meanwhile, we compared the different effects of node features and edge features and generated a feasible and reasonable assembly sequence as a benchmark for further research. Our dataset and code are available on: htps: //github.com/AIR-DISCOVER/ICRA_ASP.

IJCAI Conference 2023 Conference Paper

RZCR: Zero-shot Character Recognition via Radical-based Reasoning

  • Xiaolei Diao
  • Daqian Shi
  • Hao Tang
  • Qiang Shen
  • Yanzeng Li
  • Lei Wu
  • Hao Xu

The long-tail effect is a common issue that limits the performance of deep learning models on real-world datasets. Character image datasets are also affected by such unbalanced data distribution due to differences in character usage frequency. Thus, current character recognition methods are limited when applied in the real world, especially for the categories in the tail that lack training samples, e. g. , uncommon characters. In this paper, we propose a zero-shot character recognition framework via radical-based reasoning, called RZCR, to improve the recognition performance of few-sample character categories in the tail. Specifically, we exploit radicals, the graphical units of characters, by decomposing and reconstructing characters according to orthography. RZCR consists of a visual semantic fusion-based radical information extractor (RIE) and a knowledge graph character reasoner (KGR). RIE aims to recognize candidate radicals and their possible structural relations from character images in parallel. The results are then fed into KGR to recognize the target character by reasoning with a knowledge graph. We validate our method on multiple datasets, and RZCR shows promising experimental results, especially on few-sample character datasets.

ICRA Conference 2022 Conference Paper

Brick Yourself within 3 Minutes

  • Guyue Zhou
  • Liyi Luo
  • Hao Xu
  • Xinliang Zhang
  • Haole Guo
  • Hao Zhao 0002

This paper presents an intelligent machine which can automatically convert the captured portrait into a physical gadget made up of LEGO bricks. On the contrary to synthesising a 2D image or a virtual 3D object, generating physical 3D assembly object needs to take physical properties and assembly process into consideration, leading to more challenges. To generate brick models for arbitrary portraits, we formulate the transformation between the attribute space (extracted from 2D images) and the brick model space as a constraint integer programming problem which can be solved with a heuristic search method. Furthermore, as the bricks are physically scattered, we propose an algorithm to generate corresponding assembly instructions for customized figure-featured-bricks to facilitate users' assembly. Meanwhile, we deploy the proposed algorithms on an automatic machine which integrates a camera, a printer, a laptop, and a brick operation unit. Finally, the generated brick models and assembly instructions are evaluated by a large number of users. It is worth noting that the whole system works as an intelligent vending machine, producing a 150-brick-model within 3 minutes.

IJCAI Conference 2022 Conference Paper

Ensemble Multi-Relational Graph Neural Networks

  • Yuling Wang
  • Hao Xu
  • Yanhua Yu
  • Mengdi Zhang
  • Zhenhao Li
  • Yuji Yang
  • Wei Wu

It is well established that graph neural networks (GNNs) can be interpreted and designed from the perspective of optimization objective. With this clear optimization objective, the deduced GNNs architecture has sound theoretical foundation, which is able to flexibly remedy the weakness of GNNs. However, this optimization objective is only proved for GNNs with single-relational graph. Can we infer a new type of GNNs for multi-relational graphs by extending this optimization objective, so as to simultaneously solve the issues in previous multi-relational GNNs, e. g. , over-parameterization? In this paper, we propose a novel ensemble multi-relational GNNs by designing an ensemble multi-relational (EMR) optimization objective. This EMR optimization objective is able to derive an iterative updating rule, which can be formalized as an ensemble message passing (EnMP) layer with multi-relations. We further analyze the nice properties of EnMP layer, e. g. , the relationship with multi-relational personalized PageRank. Finally, a new multi-relational GNNs which well alleviate the over-smoothing and over-parameterization issues are proposed. Extensive experiments conducted on four benchmark datasets well demonstrate the effectiveness of the proposed model.

IJCAI Conference 2022 Conference Paper

Federated Multi-Task Attention for Cross-Individual Human Activity Recognition

  • Qiang Shen
  • Haotian Feng
  • Rui Song
  • Stefano Teso
  • Fausto Giunchiglia
  • Hao Xu

Federated Learning (FL) is an emerging privacy-aware machine learning technique that applies successfully to the collaborative learning of global models for Human Activity Recognition (HAR). As of now, the applications of FL for HAR assume that the data associated with diverse individuals follow the same distribution. However, this assumption is impractical in real-world scenarios where the same activity is frequently performed differently by different individuals. To tackle this issue, we propose FedMAT, a Federated Multi-task ATtention framework for HAR, which extracts and fuses shared as well as individual-specific multi-modal sensor data features. Specifically, we treat the HAR problem associated with each individual as a different task and train a federated multi-task model, composed of a shared feature representation network in a central server plus multiple individual-specific networks with attention modules stored in decentralized nodes. In this architecture, the attention module operates as a mask that allows to learn individual-specific features from the global model, whilst simultaneously allowing for features to be shared among different individuals. We conduct extensive experiments based on publicly available HAR datasets, which are collected in both controlled environments and real-world scenarios. Numeric results verify that our proposed FedMAT significantly outperforms baselines not only in generalizing to existing individuals but also in adapting to new individuals.

AAAI Conference 2022 Conference Paper

FINet: Dual Branches Feature Interaction for Partial-to-Partial Point Cloud Registration

  • Hao Xu
  • Nianjin Ye
  • Guanghui Liu
  • Bing Zeng
  • Shuaicheng Liu

Data association is important in the point cloud registration. In this work, we propose to solve the partial-to-partial registration from a new perspective, by introducing multi-level feature interactions between the source and the reference clouds at the feature extraction stage, such that the registration can be realized without the attentions or explicit mask estimation for the overlapping detection as adopted previously. Specifically, we present FINet, a feature interactionbased structure with the capability to enable and strengthen the information associating between the inputs at multiple stages. To achieve this, we first split the features into two components, one for rotation and one for translation, based on the fact that they belong to different solution spaces, yielding a dual branches structure. Second, we insert several interaction modules at the feature extractor for the data association. Third, we propose a transformation sensitivity loss to obtain rotation-attentive and translation-attentive features. Experiments demonstrate that our method performs higher precision and robustness compared to the state-of-the-art traditional and learning-based methods. Code is available at https: //github. com/megvii-research/FINet.

IJCAI Conference 2022 Conference Paper

Improving Few-Shot Text-to-SQL with Meta Self-Training via Column Specificity

  • Xinnan Guo
  • Yongrui Chen
  • Guilin Qi
  • Tianxing Wu
  • Hao Xu

The few-shot problem is an urgent challenge for single-table text-to-SQL. Existing methods ignore the potential value of unlabeled data, and merely rely on a coarse-grained Meta-Learning (ML) algorithm that neglects the differences of column contributions to the optimization object. This paper proposes a Meta Self-Training text-to-SQL (MST-SQL) method to solve the problem. Specifically, MST-SQL is based on column-wise HydraNet and adopts self-training as an effective mechanism to learn from readily available unlabeled samples. During each epoch of training, it first predicts pseudo-labels for unlabeled samples and then leverages them to update the parameters. A fine-grained ML algorithm is used in updating, which weighs the contribution of columns by their specificity, in order to further improve the generalizability. Extensive experimental results on both open-domain and domain-specific benchmarks reveal that our MST-SQL has significant advantages in few-shot scenarios, and is also competitive in standard supervised settings.

IROS Conference 2021 Conference Paper

Design and Analysis of a Bi-directional Transformable Wheel Robot Trimode

  • Qiwei Xu
  • Hao Xu
  • Kun Xiong
  • Qinqin Zhou 0002
  • Weizhong Guo

This article presents a novel transformable wheeled robot with three motion modes. Based on the four-bar mechanism design, the transformable wheel can transform into CW (clockwise) legged wheel mode and CCW (counterclock-wise) legged wheel mode from the circular wheel. Both legged wheel modes achieve good obstacle-climbing performance when the robot overcomes obstacles such as steps in the forward and backward directions. Mathematical expression of shape of the transformable wheel is established for design variables selection of the wheel to improve the obstacle-overcoming ability of the legged wheel modes. Dynamics model of the robot in the step-overcoming scene is analyzed to determine and reduce the torque load of the motor. Finally, two prototypes of the robot are developed. The early-stage prototype made by 3D printing verifies the step-overcoming ability of the transformable wheel. Then, structural strength of the robot chassis and the robustness of the hardware are improved in the second prototype, which can play a role in SAR and indoor service tasks. In general, the prototype can overcome steps with the height of more than 2 times the radius of the wheel in CW/CCW legged wheel mode and move at a maximum speed of 3. 15 m/s in circular wheel mode (about 5. 83 times the length of the robot per second).

JBHI Journal 2021 Journal Article

Left Ventricle Quantification Challenge: A Comprehensive Comparison and Evaluation of Segmentation and Regression for Mid-Ventricular Short-Axis Cardiac MR Data

  • Wufeng Xue
  • Jiahui Li
  • Zhiqiang Hu
  • Eric Kerfoot
  • James Clough
  • Ilkay Oksuz
  • Hao Xu
  • Vicente Grau

Automatic quantification of the left ventricle (LV) from cardiac magnetic resonance (CMR) images plays an important role in making the diagnosis procedure efficient, reliable, and alleviating the laborious reading work for physicians. Considerable efforts have been devoted to LV quantification using different strategies that include segmentation-based (SG) methods and the recent direct regression (DR) methods. Although both SG and DR methods have obtained great success for the task, a systematic platform to benchmark them remains absent because of differences in label information during model learning. In this paper, we conducted an unbiased evaluation and comparison of cardiac LV quantification methods that were submitted to the Left Ventricle Quantification (LVQuan) challenge, which was held in conjunction with the Statistical Atlases and Computational Modeling of the Heart (STACOM) workshop at the MICCAI 2018. The challenge was targeted at the quantification of 1) areas of LV cavity and myocardium, 2) dimensions of the LV cavity, 3) regional wall thicknesses (RWT), and 4) the cardiac phase, from mid-ventricle short-axis CMR images. First, we constructed a public quantification dataset Cardiac-DIG with ground truth labels for both the myocardium mask and these quantification targets across the entire cardiac cycle. Then, the key techniques employed by each submission were described. Next, quantitative validation of these submissions were conducted with the constructed dataset. The evaluation results revealed that both SG and DR methods can offer good LV quantification performance, even though DR methods do not require densely labeled masks for supervision. Among the 12 submissions, the DR method LDAMT offered the best performance, with a mean estimation error of 301 mm $^2$ for the two areas, 2. 15 mm for the cavity dimensions, 2. 03 mm for RWTs, and a 9. 5% error rate for the cardiac phase classification. Three of the SG methods also delivered comparable performances. Finally, we discussed the advantages and disadvantages of SG and DR methods, as well as the unsolved problems in automatic cardiac quantification for clinical practice applications.

IS Journal 2019 Journal Article

Automatic Vehicle Tracking With Roadside LiDAR Data for the Connected-Vehicles System

  • Yuepeng Cui
  • Hao Xu
  • Jianqing Wu
  • Yuan Sun
  • Junxuan Zhao

The existing connected-vehicle deployments obtain the real-time status of connected vehicles, but without knowing the unconnected traffic. It is urgent to find an approach to collecting the high-resolution real-time status of unconnected road users. This paper introduces a new-generation light detection and ranging (LiDAR) enhanced connected infrastructures that actively sense the high-resolution status of surrounding traffic participants with roadside LiDAR sensors and broadcast connected-vehicle messages through DSRC roadside units. The LiDAR data processing procedure, including background filtering, object clustering, vehicle recognition, lane identification, and vehicle tracking, is presented in this paper. The performance of the proposed data processing procedure is evaluated with the field collected data.

AAAI Conference 2018 Conference Paper

HAN: Hierarchical Association Network for Computing Semantic Relatedness

  • Xiaolong Gong
  • Hao Xu
  • Linpeng Huang

Measuring semantic relatedness between two words is a significant problem in many areas such as natural language processing. Existing approaches to the semantic relatedness problem mainly adopt the co-occurrence principle and regard two words as highly related if they appear in the same sentence frequently. However, such solutions suffer from low coverage and low precision because i) the two highly related words may not appear close to each other in the sentences, e. g. , the synonyms; and ii) the co-occurrence of words may happen by chance rather than implying the closeness in their semantics. In this paper, we explore the latent semantics (i. e. , concepts) of the words to identify highly related word pairs. We propose a hierarchical association network to specify the complex relationships among the words and the concepts, and quantify each relationship with appropriate measurements. Extensive experiments are conducted on real datasets and the results show that our proposed method improves correlation precision compared with the state-of-the-art approaches.

NeurIPS Conference 2011 Conference Paper

Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

  • Zhen Xiang
  • Hao Xu
  • Peter Ramadge

Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second, we study the properties of random projections in the context of learning sparse representations. Finally, we develop a hierarchical framework that uses incremental random projections and screening to learn, in small stages, a hierarchically structured dictionary for sparse representations. Empirical results show that our framework can learn informative hierarchical sparse representations more efficiently.