Arrow Research search

Author name cluster

Kun Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
2 author rows

Possible papers

15

AAAI Conference 2026 Conference Paper

Generating Attribute-Aware Human Motions from Textual Prompt

  • Xinghan Wang
  • Kun Xu
  • Fei Li
  • Cao Sheng
  • JiaZhong Yu
  • Yadong Mu

Text-driven human motion generation has recently attracted considerable attention, allowing models to generate human motions based on textual descriptions. However, current methods neglect the influence of human attributes—such as age, gender, weight, and height—which are key factors shaping human motion patterns. This work represents a pilot exploration for bridging this gap. We conceptualize each motion as comprising both attribute information and action semantics, where textual descriptions align exclusively with action semantics. To achieve this, a new framework inspired by Structural Causal Models is proposed to decouple action semantics from human attributes, enabling text-to-semantics prediction and attribute-controlled generation. The resulting model is capable of generating attribute-aware motion aligned with the user's text and attribute inputs. For evaluation, we introduce a comprehensive dataset containing attribute annotations for text-motion pairs, setting the first benchmark for attribute-aware motion generation. Extensive experiments validate our model's effectiveness.

AAAI Conference 2025 Conference Paper

Granularity-Adaptive Spatial Evidence Tokenization for Video Question Answering

  • Hao Jiang
  • Yang Jin
  • Zhicheng Sun
  • Kun Xu
  • Liwei Chen
  • Yang Song
  • Kun Gai
  • Yadong Mu

Video question answering plays a vital role in computer vision, and recent advances in large language models have further propelled the development of this field. However, existing video question answering techniques often face limitations in grasping fine-grained video content in spatial dimensions. It mainly stems from the fixed and low-resolution input of video frames. While some approaches using high-resolution inputs partially alleviate this problem, they introduce excessive computational burdens by encoding the entire high-resolution image. In this work, we propose a granularity-adaptive spatial evidence tokenization model for video question answering. Our method introduces multi-granular visual tokenization in the spatial dimension to produce video tokens at various granularities based on the question. It highlights spatially activated patches at low resolutions through a granularity weighting module and then adaptively encodes these activated patches at high resolution for detail supplementation. To mitigate the computational overhead associated with high-resolution frame encoding, a masking and acceleration module is developed for efficient visual tokenization. Moreover, a granularity compression module is designed to dynamically select and compress visual tokens of varying granularities based on questions. We conduct extensive experiments on 11 mainstream video question answering datasets and the experimental results demonstrate the effectiveness of our proposed method.

NeurIPS Conference 2024 Conference Paper

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

  • Zhicheng Sun
  • Zhenhao Yang
  • Yang Jin
  • Haozhe Chi
  • Kun Xu
  • Liwei Chen
  • Hao Jiang
  • Yang Song

Customizing diffusion models to generate identity-preserving images from user-provided reference images is an intriguing new problem. The prevalent approaches typically require training on extensive domain-specific images to achieve identity preservation, which lacks flexibility across different use cases. To address this issue, we exploit classifier guidance, a training-free technique that steers diffusion models using an existing classifier, for personalized image generation. Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators. Moreover, its solving procedure proves to be stable when anchored to a reference flow trajectory, with a convergence guarantee. The derived method is implemented on rectified flow with different off-the-shelf image discriminators, delivering advantageous personalization results for human faces, live subjects, and certain objects. Code is available at https: //github. com/feifeiobama/RectifID.

JAIR Journal 2022 Journal Article

CASA: Conversational Aspect Sentiment Analysis for Dialogue Understanding

  • Linfeng Song
  • Chunlei Xin
  • Shaopeng Lai
  • Ante Wang
  • Jinsong Su
  • Kun Xu

Dialogue understanding has always been a bottleneck for many conversational tasks, such as dialogue response generation and conversational question answering. To expedite the progress in this area, we introduce the task of conversational aspect sentiment analysis (CASA) that can provide useful fine-grained sentiment information for dialogue understanding and planning. Overall, this task extends the standard aspect-based sentiment analysis to the conversational scenario with several major adaptations. To aid the training and evaluation of data-driven methods, we annotate 3,000 chit-chat dialogues (27,198 sentences) with fine-grained sentiment information, including all sentiment expressions, their polarities and the corresponding target mentions. We also annotate an out-of-domain test set of 200 dialogues for robustness evaluation. Besides, we develop multiple baselines based on either pretrained BERT or self-attention for preliminary study. Experimental results show that our BERT-based model has strong performances for both in-domain and out-of-domain datasets, and thorough analysis indicates several potential directions for further improvements.

NeurIPS Conference 2021 Conference Paper

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

  • Guoqiang Wu
  • Chongxuan Li
  • Kun Xu
  • Jun Zhu

The (partial) ranking loss is a commonly used evaluation measure for multi-label classification, which is usually optimized with convex surrogates for computational efficiency. Prior theoretical efforts on multi-label ranking mainly focus on (Fisher) consistency analyses. However, there is a gap between existing theory and practice --- some inconsistent pairwise losses can lead to promising performance, while some consistent univariate losses usually have no clear superiority in practice. To take a step towards filling up this gap, this paper presents a systematic study from two complementary perspectives of consistency and generalization error bounds of learning algorithms. We theoretically find two key factors of the distribution (or dataset) that affect the learning guarantees of algorithms: the instance-wise class imbalance and the label size $c$. Specifically, in an extremely imbalanced case, the algorithm with the consistent univariate loss has an error bound of $O(c)$, while the one with the inconsistent pairwise loss depends on $O(\sqrt{c})$ as shown in prior work. This may shed light on the superior performance of pairwise methods in practice, where real datasets are usually highly imbalanced. Moreover, we present an inconsistent reweighted univariate loss-based algorithm that enjoys an error bound of $O(\sqrt{c})$ for promising performance as well as the computational efficiency of univariate losses. Finally, experimental results confirm our theoretical findings.

NeurIPS Conference 2020 Conference Paper

Bi-level Score Matching for Learning Energy-based Latent Variable Models

  • Fan Bao
  • Chongxuan Li
  • Kun Xu
  • Hang Su
  • Jun Zhu
  • Bo Zhang

Score matching (SM) provides a compelling approach to learn energy-based models (EBMs) by avoiding the calculation of partition function. However, it remains largely open to learn energy-based latent variable models (EBLVMs), except some special cases. This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures by reformulating SM as a bi-level optimization problem. The higher level introduces a variational posterior of the latent variables and optimizes a modified SM objective, and the lower level optimizes the variational posterior to fit the true posterior. To solve BiSM efficiently, we develop a stochastic optimization algorithm with gradient unrolling. Theoretically, we analyze the consistency of BiSM and the convergence of the stochastic algorithm. Empirically, we show the promise of BiSM in Gaussian restricted Boltzmann machines and highly nonstructural EBLVMs parameterized by deep convolutional neural networks. BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable; and can learn complex EBLVMs with intractable posteriors to generate natural images.

NeurIPS Conference 2020 Conference Paper

Boosting Adversarial Training with Hypersphere Embedding

  • Tianyu Pang
  • Xiao Yang
  • Yinpeng Dong
  • Kun Xu
  • Jun Zhu
  • Hang Su

Adversarial training (AT) is one of the most effective defenses against adversarial attacks for deep learning models. In this work, we advocate incorporating the hypersphere embedding (HE) mechanism into the AT procedure by regularizing the features onto compact manifolds, which constitutes a lightweight yet effective module to blend in the strength of representation learning. Our extensive analyses reveal that AT and HE are well coupled to benefit the robustness of the adversarially trained models from several aspects. We validate the effectiveness and adaptability of HE by embedding it into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as the FreeAT and FastAT strategies. In the experiments, we evaluate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets, which verifies that integrating HE can consistently enhance the model robustness for each AT framework with little extra computation.

AAAI Conference 2020 Conference Paper

Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment

  • Kun Xu
  • Linfeng Song
  • Yansong Feng
  • Yan Song
  • Dong Yu

Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity. This decoding method may not only cause the “many-to-one” problem but also neglect the coordinated nature of this task, that is, each alignment decision may highly correlate to the other decisions. In this paper, we introduce two coordinated reasoning methods, i. e. , the Easy-to-Hard decoding strategy and joint entity alignment algorithm. Specifically, the Easy-to- Hard strategy first retrieves the model-confident alignments from the predicted results and then incorporates them as additional knowledge to resolve the remaining model-uncertain alignments. To achieve this, we further propose an enhanced alignment model that is built on the current state-of-the-art baseline. In addition, to address the many-to-one problem, we propose to jointly predict entity alignments so that the oneto-one constraint can be naturally incorporated into the alignment prediction. Experimental results show that our model achieves the state-of-the-art performance and our reasoning methods can also significantly improve existing baselines.

NeurIPS Conference 2020 Conference Paper

Efficient Learning of Generative Models via Finite-Difference Score Matching

  • Tianyu Pang
  • Kun Xu
  • Chongxuan Li
  • Yang Song
  • Stefano Ermon
  • Jun Zhu

Several machine learning applications involve the optimization of higher-order derivatives (e. g. , gradients of gradients) during training, which can be expensive with respect to memory and computation even with automatic differentiation. As a typical example in generative modeling, score matching~(SM) involves the optimization of the trace of a Hessian. To improve computing efficiency, we rewrite the SM objective and its variants in terms of directional derivatives, and present a generic strategy to efficiently approximate any-order directional derivative with finite difference~(FD). Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations. Thus, it reduces the total computational cost while also improving numerical stability. We provide two instantiations by reformulating variants of SM objectives into the FD forms. Empirically, we demonstrate that our methods produce results comparable to the gradient-based counterparts while being much more computationally efficient.

AAAI Conference 2020 Conference Paper

Relation Extraction Exploiting Full Dependency Forests

  • Lifeng Jin
  • Linfeng Song
  • Yue Zhang
  • Kun Xu
  • Wei-Yun Ma
  • Dong Yu

Dependency syntax has long been recognized as a crucial source of features for relation extraction. Previous work considers 1-best trees produced by a parser during preprocessing. However, error propagation from the out-of-domain parser may impact the relation extraction performance. We propose to leverage full dependency forests for this task, where a full dependency forest encodes all possible trees. Such representations of full dependency forests provide a differentiable connection between a parser and a relation extraction model, and thus we are also able to study adjusting the parser parameters based on end-task loss. Experiments on three datasets show that full dependency forests and parser adjustment give significant improvements over carefully designed baselines, showing state-of-the-art or competitive performances on biomedical or newswire benchmarks.

AAAI Conference 2019 Conference Paper

Lattice CNNs for Matching Based Chinese Question Answering

  • Yuxuan Lai
  • Yansong Feng
  • Xiaohan Yu
  • Zheng Wang
  • Kun Xu
  • Dongyan Zhao

Short text matching often faces the challenges that there are great word mismatch and expression diversity between the two texts, which would be further aggravated in languages like Chinese where there is no natural space to segment words explicitly. In this paper, we propose a novel lattice based CNN model (LCNs) to utilize multi-granularity information inherent in the word lattice while maintaining strong ability to deal with the introduced noisy information for matching based question answering in Chinese. We conduct extensive experiments on both document based question answering and knowledge based question answering tasks, and experimental results show that the LCNs models can significantly outperform the state-of-the-art matching models and strong baselines by taking advantages of better ability to distill rich but discriminative information from the word lattice input.

AAAI Conference 2015 Conference Paper

What Is the Longest River in the USA? Semantic Parsing for Aggregation Questions

  • Kun Xu
  • Sheng Zhang
  • Yansong Feng
  • Songfang Huang
  • Dongyan Zhao

Answering natural language questions against structured knowledge bases (KB) has been attracting increasing attention in both IR and NLP communities. The task involves two main challenges: recognizing the questions’ meanings, which are then grounded to a given KB. Targeting simple factoid questions, many existing open domain semantic parsers jointly solve these two subtasks, but are usually expensive in complexity and resources. In this paper, we propose a simple pipeline framework to efficiently answer more complicated questions, especially those implying aggregation operations, e. g. , argmax, argmin. We first develop a transitionbased parsing model to recognize the KB-independent meaning representation of the user’s intention inherent in the question. Secondly, we apply a probabilistic model to map the meaning representation, including those aggregation functions, to a structured query. The experimental results showe that our method can better understand aggregation questions, outperforming the state-of-the-art methods on the Free917 dataset while still maintaining promising performance on a more challenging dataset, WebQuestions, without extra training.

IROS Conference 2010 Conference Paper

Energy management for four-wheel independent driving vehicle

  • Huihuan Qian
  • Guoqing Xu
  • Jingyu Yan 0001
  • Tin Lun Lam
  • Yangsheng Xu
  • Kun Xu

The promising electric vehicle (EV) technology is a direction to tackle the global non-renewable energy problem. However, the efficiency to use the electric energy still needs deliberate research. Traditional EV has no choice to manage its energy flow, because it has only one traction motor. With the robotic research in 4 wheel independent drive (4WID), the driving task of the single traction motor can be shared by 4 independent in-wheel motors. By exploring the motor efficiency map, we propose the energy management strategy based on optimal driving torque distribution (ODTD). The total input power of the 4 motors can be minimized while the driving performance is still maintained, and electric energy consumption can be reduced compared with traditional single motor driving EV. Simulation results validate the proposed strategy. The energy management strategy can also be applied to multi-driving-wheel mobile robots.