Author name cluster

Kun Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

Generating Attribute-Aware Human Motions from Textual Prompt

Xinghan Wang
Kun Xu
Fei Li
Cao Sheng
JiaZhong Yu
Yadong Mu

Text-driven human motion generation has recently attracted considerable attention, allowing models to generate human motions based on textual descriptions. However, current methods neglect the influence of human attributes—such as age, gender, weight, and height—which are key factors shaping human motion patterns. This work represents a pilot exploration for bridging this gap. We conceptualize each motion as comprising both attribute information and action semantics, where textual descriptions align exclusively with action semantics. To achieve this, a new framework inspired by Structural Causal Models is proposed to decouple action semantics from human attributes, enabling text-to-semantics prediction and attribute-controlled generation. The resulting model is capable of generating attribute-aware motion aligned with the user's text and attribute inputs. For evaluation, we introduce a comprehensive dataset containing attribute annotations for text-motion pairs, setting the first benchmark for attribute-aware motion generation. Extensive experiments validate our model's effectiveness.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Granularity-Adaptive Spatial Evidence Tokenization for Video Question Answering

Hao Jiang
Yang Jin
Zhicheng Sun
Kun Xu
Liwei Chen
Yang Song
Kun Gai
Yadong Mu

Video question answering plays a vital role in computer vision, and recent advances in large language models have further propelled the development of this field. However, existing video question answering techniques often face limitations in grasping fine-grained video content in spatial dimensions. It mainly stems from the fixed and low-resolution input of video frames. While some approaches using high-resolution inputs partially alleviate this problem, they introduce excessive computational burdens by encoding the entire high-resolution image. In this work, we propose a granularity-adaptive spatial evidence tokenization model for video question answering. Our method introduces multi-granular visual tokenization in the spatial dimension to produce video tokens at various granularities based on the question. It highlights spatially activated patches at low resolutions through a granularity weighting module and then adaptively encodes these activated patches at high resolution for detail supplementation. To mitigate the computational overhead associated with high-resolution frame encoding, a masking and acceleration module is developed for efficient visual tokenization. Moreover, a granularity compression module is designed to dynamically select and compress visual tokens of varying granularities based on questions. We conduct extensive experiments on 11 mainstream video question answering datasets and the experimental results demonstrate the effectiveness of our proposed method.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Zhicheng Sun
Zhenhao Yang
Yang Jin
Haozhe Chi
Kun Xu
Liwei Chen
Hao Jiang
Yang Song

Customizing diffusion models to generate identity-preserving images from user-provided reference images is an intriguing new problem. The prevalent approaches typically require training on extensive domain-specific images to achieve identity preservation, which lacks flexibility across different use cases. To address this issue, we exploit classifier guidance, a training-free technique that steers diffusion models using an existing classifier, for personalized image generation. Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators. Moreover, its solving procedure proves to be stable when anchored to a reference flow trajectory, with a convergence guarantee. The derived method is implemented on rectified flow with different off-the-shelf image discriminators, delivering advantageous personalization results for human faces, live subjects, and certain objects. Code is available at https: //github. com/feifeiobama/RectifID.

PDF Details DOI

EAAI Journal 2023 Journal Article

Intelligent fault diagnosis of bearings under small samples: A mechanism-data fusion approach

Kun Xu
Xianguang Kong
Qibin Wang
Bing Han
Liqiang Sun

Details DOI

JAIR Journal 2022 Journal Article

CASA: Conversational Aspect Sentiment Analysis for Dialogue Understanding

Linfeng Song
Chunlei Xin
Shaopeng Lai
Ante Wang
Jinsong Su
Kun Xu

Dialogue understanding has always been a bottleneck for many conversational tasks, such as dialogue response generation and conversational question answering. To expedite the progress in this area, we introduce the task of conversational aspect sentiment analysis (CASA) that can provide useful fine-grained sentiment information for dialogue understanding and planning. Overall, this task extends the standard aspect-based sentiment analysis to the conversational scenario with several major adaptations. To aid the training and evaluation of data-driven methods, we annotate 3,000 chit-chat dialogues (27,198 sentences) with fine-grained sentiment information, including all sentiment expressions, their polarities and the corresponding target mentions. We also annotate an out-of-domain test set of 200 dialogues for robustness evaluation. Besides, we develop multiple baselines based on either pretrained BERT or self-attention for preliminary study. Experimental results show that our BERT-based model has strong performances for both in-domain and out-of-domain datasets, and thorough analysis indicates several potential directions for further improvements.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Guoqiang Wu
Chongxuan Li
Kun Xu
Jun Zhu

The (partial) ranking loss is a commonly used evaluation measure for multi-label classification, which is usually optimized with convex surrogates for computational efficiency. Prior theoretical efforts on multi-label ranking mainly focus on (Fisher) consistency analyses. However, there is a gap between existing theory and practice --- some inconsistent pairwise losses can lead to promising performance, while some consistent univariate losses usually have no clear superiority in practice. To take a step towards filling up this gap, this paper presents a systematic study from two complementary perspectives of consistency and generalization error bounds of learning algorithms. We theoretically find two key factors of the distribution (or dataset) that affect the learning guarantees of algorithms: the instance-wise class imbalance and the label size $c$. Specifically, in an extremely imbalanced case, the algorithm with the consistent univariate loss has an error bound of $O(c)$, while the one with the inconsistent pairwise loss depends on $O(\sqrt{c})$ as shown in prior work. This may shed light on the superior performance of pairwise methods in practice, where real datasets are usually highly imbalanced. Moreover, we present an inconsistent reweighted univariate loss-based algorithm that enjoys an error bound of $O(\sqrt{c})$ for promising performance as well as the computational efficiency of univariate losses. Finally, experimental results confirm our theoretical findings.

PDF Details

NeurIPS Conference 2020 Conference Paper

Bi-level Score Matching for Learning Energy-based Latent Variable Models

Fan Bao
Chongxuan Li
Kun Xu
Hang Su
Jun Zhu
Bo Zhang

Score matching (SM) provides a compelling approach to learn energy-based models (EBMs) by avoiding the calculation of partition function. However, it remains largely open to learn energy-based latent variable models (EBLVMs), except some special cases. This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures by reformulating SM as a bi-level optimization problem. The higher level introduces a variational posterior of the latent variables and optimizes a modified SM objective, and the lower level optimizes the variational posterior to fit the true posterior. To solve BiSM efficiently, we develop a stochastic optimization algorithm with gradient unrolling. Theoretically, we analyze the consistency of BiSM and the convergence of the stochastic algorithm. Empirically, we show the promise of BiSM in Gaussian restricted Boltzmann machines and highly nonstructural EBLVMs parameterized by deep convolutional neural networks. BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable; and can learn complex EBLVMs with intractable posteriors to generate natural images.

PDF Details

NeurIPS Conference 2020 Conference Paper

Boosting Adversarial Training with Hypersphere Embedding

Tianyu Pang
Xiao Yang
Yinpeng Dong
Kun Xu
Jun Zhu
Hang Su

Adversarial training (AT) is one of the most effective defenses against adversarial attacks for deep learning models. In this work, we advocate incorporating the hypersphere embedding (HE) mechanism into the AT procedure by regularizing the features onto compact manifolds, which constitutes a lightweight yet effective module to blend in the strength of representation learning. Our extensive analyses reveal that AT and HE are well coupled to benefit the robustness of the adversarially trained models from several aspects. We validate the effectiveness and adaptability of HE by embedding it into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as the FreeAT and FastAT strategies. In the experiments, we evaluate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets, which verifies that integrating HE can consistently enhance the model robustness for each AT framework with little extra computation.

PDF Details

AAAI Conference 2020 Conference Paper

Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment

Kun Xu
Linfeng Song
Yansong Feng
Yan Song
Dong Yu

Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity. This decoding method may not only cause the “many-to-one” problem but also neglect the coordinated nature of this task, that is, each alignment decision may highly correlate to the other decisions. In this paper, we introduce two coordinated reasoning methods, i. e. , the Easy-to-Hard decoding strategy and joint entity alignment algorithm. Speciﬁcally, the Easy-to- Hard strategy ﬁrst retrieves the model-conﬁdent alignments from the predicted results and then incorporates them as additional knowledge to resolve the remaining model-uncertain alignments. To achieve this, we further propose an enhanced alignment model that is built on the current state-of-the-art baseline. In addition, to address the many-to-one problem, we propose to jointly predict entity alignments so that the oneto-one constraint can be naturally incorporated into the alignment prediction. Experimental results show that our model achieves the state-of-the-art performance and our reasoning methods can also signiﬁcantly improve existing baselines.

PDF Details

NeurIPS Conference 2020 Conference Paper

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang
Kun Xu
Chongxuan Li
Yang Song
Stefano Ermon
Jun Zhu

Several machine learning applications involve the optimization of higher-order derivatives (e. g. , gradients of gradients) during training, which can be expensive with respect to memory and computation even with automatic differentiation. As a typical example in generative modeling, score matching~(SM) involves the optimization of the trace of a Hessian. To improve computing efficiency, we rewrite the SM objective and its variants in terms of directional derivatives, and present a generic strategy to efficiently approximate any-order directional derivative with finite difference~(FD). Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations. Thus, it reduces the total computational cost while also improving numerical stability. We provide two instantiations by reformulating variants of SM objectives into the FD forms. Empirically, we demonstrate that our methods produce results comparable to the gradient-based counterparts while being much more computationally efficient.

PDF Details

AAAI Conference 2020 Conference Paper

Relation Extraction Exploiting Full Dependency Forests

Lifeng Jin
Linfeng Song
Yue Zhang
Kun Xu
Wei-Yun Ma
Dong Yu

Dependency syntax has long been recognized as a crucial source of features for relation extraction. Previous work considers 1-best trees produced by a parser during preprocessing. However, error propagation from the out-of-domain parser may impact the relation extraction performance. We propose to leverage full dependency forests for this task, where a full dependency forest encodes all possible trees. Such representations of full dependency forests provide a differentiable connection between a parser and a relation extraction model, and thus we are also able to study adjusting the parser parameters based on end-task loss. Experiments on three datasets show that full dependency forests and parser adjustment give signiﬁcant improvements over carefully designed baselines, showing state-of-the-art or competitive performances on biomedical or newswire benchmarks.

PDF Details

AAAI Conference 2019 Conference Paper

Lattice CNNs for Matching Based Chinese Question Answering

Yuxuan Lai
Yansong Feng
Xiaohan Yu
Zheng Wang
Kun Xu
Dongyan Zhao

Short text matching often faces the challenges that there are great word mismatch and expression diversity between the two texts, which would be further aggravated in languages like Chinese where there is no natural space to segment words explicitly. In this paper, we propose a novel lattice based CNN model (LCNs) to utilize multi-granularity information inherent in the word lattice while maintaining strong ability to deal with the introduced noisy information for matching based question answering in Chinese. We conduct extensive experiments on both document based question answering and knowledge based question answering tasks, and experimental results show that the LCNs models can significantly outperform the state-of-the-art matching models and strong baselines by taking advantages of better ability to distill rich but discriminative information from the word lattice input.

PDF Details

AAAI Conference 2015 Conference Paper

What Is the Longest River in the USA? Semantic Parsing for Aggregation Questions

Kun Xu
Sheng Zhang
Yansong Feng
Songfang Huang
Dongyan Zhao

Answering natural language questions against structured knowledge bases (KB) has been attracting increasing attention in both IR and NLP communities. The task involves two main challenges: recognizing the questions’ meanings, which are then grounded to a given KB. Targeting simple factoid questions, many existing open domain semantic parsers jointly solve these two subtasks, but are usually expensive in complexity and resources. In this paper, we propose a simple pipeline framework to efficiently answer more complicated questions, especially those implying aggregation operations, e. g. , argmax, argmin. We first develop a transitionbased parsing model to recognize the KB-independent meaning representation of the user’s intention inherent in the question. Secondly, we apply a probabilistic model to map the meaning representation, including those aggregation functions, to a structured query. The experimental results showe that our method can better understand aggregation questions, outperforming the state-of-the-art methods on the Free917 dataset while still maintaining promising performance on a more challenging dataset, WebQuestions, without extra training.

PDF Details

IROS Conference 2010 Conference Paper

Energy management for four-wheel independent driving vehicle

Huihuan Qian
Guoqing Xu
Jingyu Yan 0001
Tin Lun Lam
Yangsheng Xu
Kun Xu

The promising electric vehicle (EV) technology is a direction to tackle the global non-renewable energy problem. However, the efficiency to use the electric energy still needs deliberate research. Traditional EV has no choice to manage its energy flow, because it has only one traction motor. With the robotic research in 4 wheel independent drive (4WID), the driving task of the single traction motor can be shared by 4 independent in-wheel motors. By exploring the motor efficiency map, we propose the energy management strategy based on optimal driving torque distribution (ODTD). The total input power of the 4 motors can be minimized while the driving performance is still maintained, and electric energy consumption can be reduced compared with traditional single motor driving EV. Simulation results validate the proposed strategy. The energy management strategy can also be applied to multi-driving-wheel mobile robots.

Details

EAAI Journal 1997 Journal Article

Comparison of shape features for the classification of wear particles

Kun Xu
A.R. Luxmoore
Farzin Deravi

Details DOI