Author name cluster

Junsheng Luan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

AAAI Conference 2026 Conference Paper

Inpaint-Anywhere: Zero-Shot Multi-Identity Inpainting with Efficient Diffusion Transformer

Junsheng Luan
Lei Zhao
Wei Xing

Subject-driven generation, which aims to synthesize visual content for a given identity V* with specific attributes, has garnered increasing attention in recent years. While existing methods demonstrate impressive identity consistency for both single and multiple identities, they often lack user-specified spatial control. Recent approaches, such as OminiControl-2 and EasyControl, enable inpainting conditioned on a single identity but fall short in multi-identity scenarios. In this paper, we introduce BoundID, a dataset synthesis pipeline for generating multi-identity images with bounding box annotations, and introduce Inpaint-Anywhere, a diffusion transformer framework for multi-identity inpainting. Given multiple identity references and corresponding masks, our method simultaneously generates all desired identities at precise locations while achieving both high identity and prompt fidelity. Extensive experiments show that Inpaint-Anywhere achieves state-of-the-art performance in multi-identity inpainting.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Cascaded Diffusion Models for Virtual Try-On: Improving Control and Resolution

Guangyuan Li
Yongkang Wang
Junsheng Luan
Lei Zhao
Wei Xing
Huaizhong Lin
Binkai Ou

Previous virtual try-on methods have employed ControlNet architecture in exemplar-based inpainting diffusion models to guide the generation of try-on images, preserving the garment's features and enhancing the realism of the generated images. While these methods have maintained the identity of the garment and improved the naturalness of the generated images, they still face the following limitations: (1) For garments with complex features, such as intricate text, patterns, and uncommon styles, they struggle to retain these detailed features in the generated try-on images. (2) They are limited to generating try-on images at a maximum resolution of 1K, which may not meet the demands of real-world scenarios, where higher resolutions might be required. To address the aforementioned issues, in this paper, we propose a Cascaded Diffusion Model for virtual try-on to enhance both image controllability and resolution. We call it CDM-VTON. Specifically, we design two diffusion models: the Multi-Conditioned Diffusion Model (MC-DM) and the Super-Resolution Diffusion Model (SR-DM). The former generates low-resolution try-on images while preserving the garment's complex features, and the latter enhances the resolution of these images. Additionally, we incorporate a multi-control integration module in the MC-DM, which injects multiple control conditions into a frozen denoising U-Net to ensure that the generated try-on images retain complex garment features. Our experimental results demonstrate that our method outperforms previous approaches in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively.

PDF Details DOI

EAAI Journal 2025 Journal Article

Personalized text-to-image generation with Large Language and Vision Assistant enhanced training

Junsheng Luan
Zhanjie Zhang
Wei Xing
Lei Zhao

Personalized image generation aims to synthesize images of a specific identity. The identity, denoted as V ∗, refers to an entity with distinctive visual attributes, such as a dog-shaped backpack. However, existing methods like DreamBooth and Custom Diffusion often struggle to generate images of V ∗ that accurately match the input prompts. In this work, we analyze two key issues underlying this limitation: (1) the overbinding problem, where the prompt tokens used to represent V ∗ unintentionally bind to irrelevant visual details from the reference image during training; and (2) the low language prior problem, where insufficient use of pre-trained language prior limits the model’s ability to faithfully generate all the prompt words. To overcome these challenges, we propose LLaVA-Booth, a novel personalization method for diverse, identity-preserving image generation, based on Large Language and Vision Assistant (LLaVA) enhanced training. Our method alleviates the overbinding problem by disentangling background information and solves the low language prior problem by enriching the language context. Additionally, we introduce two auxiliary objectives: (1) an identity (ID) binding loss to strengthen the identity binding and (2) a prior preservation loss to prevent language drift and encourage generation diversity. Experiments demonstrate that LLaVA-Booth effectively mitigates overbinding and enhances language priors to improve prompt fidelity, then generates diverse, high-quality, and identity-preserving images of V ∗.

Details DOI

EAAI Journal 2025 Journal Article

VectorSketcher: Learning to create a vector-based free-hand sketch

Zhanjie Zhang
Quanwei Zhang
Junsheng Luan
Mengyuan Yang
Yun Wang
Lei Zhao

Sketch synthesis refers to converting a given content image into a sketch. Existing sketch synthesis methods are generally divided into pixel-based and vector-based sketch synthesis. Pixel-based sketch synthesis methods always introduce obvious artifacts and disharmonious patterns. Besides, they cannot support generating sketches with different levels of abstraction. The vector-based sketch synthesis methods have limitations in describing the structure and semantics of the content image. To tackle these problems, we propose a novel framework called VectorSketcher, which can create vectorized sketches that accurately describe the structure and semantics of the content images without introducing obvious artifacts and disharmonious patterns. Specifically, we proposed a Multi-scale Feature-based Stroke Initialization (MFSI) to speed up the optimization and essential visual details of the given image. We introduce a Controllable Score Distillation Sampling loss (CSDS) to further learn the content image’s detail. Extensive quantitative and qualitative experiments show that VectorSketcher can generate more accurate vector-based sketches than existing state-of-the-art (SOTA) sketch synthesis methods.

Details DOI

AAAI Conference 2024 Conference Paper

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

Zhanjie Zhang
Quanwei Zhang
Wei Xing
Guangyuan Li
Lei Zhao
Jiakai Sun
Zehua Lan
Junsheng Luan

Artistic style transfer aims to repaint the content image with the learned artistic style. Existing artistic style transfer methods can be divided into two categories: small model-based approaches and pre-trained large-scale model-based approaches. Small model-based approaches can preserve the content strucuture, but fail to produce highly realistic stylized images and introduce artifacts and disharmonious patterns; Pre-trained large-scale model-based approaches can generate highly realistic stylized images but struggle with preserving the content structure. To address the above issues, we propose ArtBank, a novel artistic style transfer framework, to generate highly realistic stylized images while preserving the content structure of the content images. Specifically, to sufficiently dig out the knowledge embedded in pre-trained large-scale models, an Implicit Style Prompt Bank (ISPB), a set of trainable parameter matrices, is designed to learn and store knowledge from the collection of artworks and behave as a visual prompt to guide pre-trained large-scale models to generate highly realistic stylized images while preserving content structure. Besides, to accelerate training the above ISPB, we propose a novel Spatial-Statistical-based self-Attention Module (SSAM). The qualitative and quantitative experiments demonstrate the superiority of our proposed method over state-of-the-art artistic style transfer methods. Code is available at https://github.com/Jamie-Cheung/ArtBank.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt

Zhanjie Zhang
Quanwei Zhang
Huaizhong Lin
Wei Xing
Juncheng Mo
Shuaicheng Huang
Jinheng Xie
Guangyuan Li

Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images. Existing generative adversarial network-based methods fail to generate highly realistic stylized images and always introduce obvious artifacts and disharmonious patterns. Recently, large-scale pre-trained diffusion models opened up a new way for generating highly realistic artistic stylized images. However, diffusion model-based methods generally fail to preserve the content structure of input content images well, introducing some undesired content structure and style patterns. To address the above problems, we propose a novel pre-trained diffusion-based artistic style transfer method, called LSAST, which can generate highly realistic artistic stylized images while preserving the content structure of input content images well, without bringing obvious artifacts and disharmonious style patterns. Specifically, we introduce a Step-aware and Layer-aware Prompt Space, a set of learnable prompts, which can learn the style information from the collection of artworks and dynamically adjusts the input images' content structure and style pattern. To train our prompt space, we propose a novel inversion method, called Step-ware and Layer-aware Prompt Inversion, which allows the prompt space to learn the style information of the artworks collection. In addition, we inject a pre-trained conditional branch of ControlNet into our LSAST, which further improved our framework's ability to maintain content structure. Extensive experiments demonstrate that our proposed method can generate more highly realistic artistic stylized images than the state-of-the-art artistic style transfer methods. Code is available at https: //github. com/Jamie-Cheung/LSAST.

PDF Details DOI