Author name cluster

Ning Xie

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

AAAI Conference 2026 Conference Paper

MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation

Qian Liang
Yujia Wu
Kuncheng Li
Jiwei Wei
Shiyuan He
Jinyu Guo
Ning Xie

Multimodal Large Language Models (MLLMs) with unified architectures excel across a wide range of vision-language tasks, yet aligning them with personalized image generation remains a significant challenge. Existing methods for MLLMs are frequently subject-specific, demanding a data-intensive fine-tuning process for every new subject, which limits their scalability. In this paper, we introduce MM-R1, a framework that integrates a cross-modal Chain-of-Thought (X-CoT) reasoning strategy to unlock the inherent potential of unified MLLMs for personalized image generation. Specifically, we structure personalization as an integrated visual reasoning and generation process: (1) grounding subject concepts by interpreting and understanding user-provided images and contextual cues, and (2) generating personalized images conditioned on both the extracted subject representations and user prompts. To further enhance the reasoning capability, we adopt Grouped Reward Proximal Policy Optimization(GRPO) to explicitly align the generation. Experiments demonstrate that MM-R1 unleashes the personalization capability of unified MLLMs to generate images with high subject fidelity and strong text alignment in a zero-shot manner.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Training-Free ANN-to-SNN Conversion for High-Performance Spiking Transformers

Jingya Wang
Xin Deng
Wenjie Wei
Dehao Zhang
Shuai Wang
Qian Sun
Jieyuan Zhang
Hanwen Liu

Leveraging the event-driven paradigm, Spiking Neural Networks (SNNs) offer a promising approach for constructing energy-efficient Transformer architectures. Compared to directly trained Spiking Transformers, ANN-to-SNN conversion methods bypass the high training costs. However, existing methods still suffer from notable limitations, failing to effectively handle nonlinear operations in Transformer architectures and requiring additional fine-tuning processes for pre-trained ANNs. To address these issues, we propose a high-performance and training-free ANN-to-SNN conversion framework tailored for Transformer architectures. Specifically, we introduce a Multi-basis Exponential Decay (MBE) neuron, which employs an exponential decay strategy and multi-basis encoding method to efficiently approximate various nonlinear operations. It removes the requirement for weight modifications in pre-trained ANNs. Extensive experiments across diverse tasks (CV, NLU, NLG) and mainstream Transformer architectures (ViT, RoBERTa, GPT-2) demonstrate that our method achieves near-lossless conversion accuracy with significantly lower latency. This provides a promising pathway for the efficient and scalable deployment of Spiking Transformers in real-world applications.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Features

Ke Liu
Jiwei Wei
Shiyuan He
Zeyu Ma
Chaoning Zhang
Ning Xie
Yang Yang

Generating high-fidelity talking heads that maintain stable head poses and achieve robust lip sync remains a significant challenge. Although methods based on 3D Gaussian Splatting (3DGS) offer a promising solution via point-based deformation, they suffer from inconsistent head dynamics and mismatched mouth movements due to unstable Gaussian initialization and incomplete speech features. To overcome these limitations, we introduce SyncGaussian, a 3DGS-based framework that ensures stable head poses, enhanced lip sync, and realistic appearances with real-time rendering. SyncGaussian employs a stable head Gaussian initialization strategy to mitigate head jitter by optimizing commonly used rough head pose parameters. To enhance lip sync, we propose a sync-enhanced encoder that leverages audio-to-text and audio-to-visual speech features. Guided by a tailored cosine similarity loss function, the encoder integrates discriminative speech features through a multi-level sync adaptation mechanism, enabling the learning of an adaptive speech feature space. Extensive experiments demonstrate that SyncGaussian outperforms state-of-the-art methods in image quality, dynamic motion, and lip sync, with the potential for real-time applications.

PDF Details DOI

AAAI Conference 2024 Conference Paper

CDPNet: Cross-Modal Dual Phases Network for Point Cloud Completion

Zhenjiang Du
Jiale Dou
Zhitao Liu
Jiwei Wei
Guan Wang
Ning Xie
Yang Yang

Point cloud completion aims at completing shapes from their partial. Most existing methods utilized shape’s priors information for point cloud completion, such as inputting the partial and getting the complete one through an encoder-decoder deep learning structure. However, it is very often to easily cause the loss of information in the generation process because of the invisibility of missing areas. Unlike most existing methods directly inferring the missing points using shape priors, we address it as a cross-modality task. We propose a new Cross-modal Dual Phases Network (CDPNet) for shape completion. Our key idea is that the global information of the shape is obtained from the extra single-view image, and the partial point clouds provide the geometric information. After that, the multi-modal features jointly guide the specific structural information. To learn the geometric details of the shape, we chose to use patches to preserve the local geometric feature. In this way, we can generate shapes with enough geometric details. Experimental results show that our method achieves state-of-the-art performance on point cloud completion.

PDF Details DOI

TCS Journal 2023 Journal Article

A generalization of a theorem of Rothschild and van Lint

Ning Xie
Shuai Xu
Yekun Xu

Details DOI

AAAI Conference 2023 Conference Paper

Denoising Pre-training for Machine Translation Quality Estimation with Curriculum Learning

Xiang Geng
Yu Zhang
Jiahuan Li
Shujian Huang
Hao Yang
Shimin Tao
Yimeng Chen
Ning Xie

Quality estimation (QE) aims to assess the quality of machine translations when reference translations are unavailable. QE plays a crucial role in many real-world applications of machine translation. Because labeled QE data are usually limited in scale, recent research, such as DirectQE, pre-trains QE models with pseudo QE data and obtains remarkable performance. However, there tends to be inevitable noise in the pseudo data, hindering models from learning QE accurately. Our study shows that the noise mainly comes from the differences between pseudo and real translation outputs. To handle this problem, we propose CLQE, a denoising pre-training framework for QE based on curriculum learning. More specifically, we propose to measure the degree of noise in the pseudo QE data with some metrics based on statistical or distributional features. With the guidance of these metrics, CLQE gradually pre-trains the QE model using data from cleaner to noisier. Experiments on various benchmarks reveal that CLQE outperforms DirectQE and other strong baselines. We also show that with our framework, pre-training converges faster than directly using the pseudo data. We make our CLQE code available (https://github.com/NJUNLP/njuqe).

PDF Details DOI

TCS Journal 2023 Journal Article

Finding optimal non-datapath caching strategies via network flow

Steven Lyons
Raju Rangaswami
Ning Xie

Details DOI

JAIR Journal 2022 Journal Article

Explainable Deep Learning: A Field Guide for the Uninitiated

Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran

Deep neural networks (DNNs) are an indispensable machine learning tool despite the difficulty of diagnosing what aspects of a model’s input drive its decisions. In countless real-world domains, from legislation and law enforcement to healthcare, such diagnosis is essential to ensure that DNN decisions are driven by aspects appropriate in the context of its use. The development of methods and studies enabling the explanation of a DNN’s decisions has thus blossomed into an active and broad area of research. The field’s complexity is exacerbated by competing definitions of what it means “to explain” the actions of a DNN and to evaluate an approach’s “ability to explain”. This article offers a field guide to explore the space of explainable deep learning for those in the AI/ML field who are uninitiated. The field guide: i) Introduces three simple dimensions defining the space of foundational methods that contribute to explainable deep learning, ii) discusses the evaluations for model explanations, iii) places explainability in the context of other related deep learning research areas, and iv) discusses user-oriented explanation design and future directions. We hope the guide is seen as a starting point for those embarking on this research field.

PDF Details DOI

TCS Journal 2019 Journal Article

A new coding-based algorithm for finding closest pair of vectors

Ning Xie
Shuai Xu
Yekun Xu

Details DOI

IJCAI Conference 2015 Conference Paper

Stroke-Based Stylization Learning and Rendering with Inverse Reinforcement Learning

Ning Xie
Tingting Zhao
Feng Tian
Xiao Hua Zhang
Masashi Sugiyama

Among various traditional art forms, brush stroke drawing is one of the widely used styles in modern computer graphic tools such as GIMP, Photoshop and Painter. In this paper, we develop an AI-aided art authoring (A4) system of nonphotorealistic rendering that allows users to automatically generate brush stroke paintings in a specific artist’s style. Within the reinforcement learning framework of brush stroke generation proposed by Xie et al. [Xie et al. , 2012], our contribution in this paper is to learn artists’ drawing styles from video-captured stroke data by inverse reinforcement learning. Through experiments, we demonstrate that our system can successfully learn artists’ styles and render pictures with consistent and smooth brush strokes.

PDF Details