Author name cluster

Xianhui Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

GenColor: Generative and Expressive Color Enhancement with Pixel-Perfect Texture Preservation

Yi Dong
Yuxi Wang
Xianhui Lin
Wenqi Ouyang
Zhiqi Shen
Peiran Ren
Ruoxi Fan
Rynson Lau

Color enhancement is a crucial yet challenging task in digital photography. It demands methods that are (i) expressive enough for fine-grained adjustments, (ii) adaptable to diverse inputs, and (iii) able to preserve texture. Existing approaches typically fall short in at least one of these aspects, yielding unsatisfactory results. We propose GenColor, a novel diffusion-based framework for sophisticated, texture-preserving color enhancement. GenColor reframes the task as conditional image generation. Leveraging ControlNet and a tailored training scheme, it learns advanced color transformations that adapt to diverse lighting and content. We train GenColor on ARTISAN, our newly collected large-scale dataset of 1. 2M high-quality photographs specifically curated for enhancement tasks. To overcome texture preservation limitations inherent in diffusion models, we introduce a color-transfer network with a novel degradation scheme that simulates texture–color relationships. This network achieves pixel-perfect texture preservation while enabling fine-grained color matching with the diffusion-generated reference images. Extensive experiments show that GenColor produces visually compelling results comparable to those of expert colorists and surpasses state-of-the-art methods in both subjective and objective evaluations. We have released the code and dataset.

PDF Details

ICLR Conference 2025 Conference Paper

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

Jun-Yan He
Zhi-Qi Cheng
Chenyang Li 0007
Jingdong Sun
Qi He 0007
Wangmeng Xiang
Hanyuan Chen
Jin-Peng Lan

MetaDesigner introduces a transformative framework for artistic typography synthesis, powered by Large Language Models (LLMs) and grounded in a user-centric design paradigm. Its foundation is a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively orchestrate the creation of customizable WordArt, ranging from semantic enhancements to intricate textural elements. A central feedback mechanism leverages insights from both multimodal models and user evaluations, enabling iterative refinement of design parameters. Through this iterative process, MetaDesigner dynamically adjusts hyperparameters to align with user-defined stylistic and thematic preferences, consistently delivering WordArt that excels in visual quality and contextual resonance. Empirical evaluations underscore the system's versatility and effectiveness across diverse WordArt applications, yielding outputs that are both aesthetically compelling and context-sensitive.

Details

AAAI Conference 2025 Conference Paper

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Yabo Zhang
Yuxiang Wei
Xianhui Lin
Zheng Hui
Peiran Ren
Xuansong Xie
Wangmeng Zuo

Text-to-image diffusion models (T2I) have demonstrated unprecedented capabilities in creating realistic and aesthetic images. On the contrary, text-to-video diffusion models (T2V) still lag far behind in frame quality and text alignment, owing to insufficient quality and quantity of training videos. In this paper, we introduce VideoElevator, a training-free and plug-and-play method, which elevates the performance of T2V using superior capabilities of T2I. Different from conventional T2V sampling (i.e., temporal and spatial modeling), VideoElevator explicitly decomposes each sampling step into temporal motion refining and spatial quality elevating. Specifically, temporal motion refining uses encapsulated T2V to enhance temporal consistency, followed by inverting to the noise distribution required by T2I. Then, spatial quality elevating harnesses inflated T2I to directly predict less noisy latent, adding more photo-realistic details. We have conducted experiments in extensive prompts under the combination of various T2V and T2I. The results show that VideoElevator not only improves the performance of T2V baselines with foundational T2I, but also facilitates stylistic video synthesis with personalized T2I. Please watch all videos in supplementary materials for better view.

PDF Details DOI

AAAI Conference 2024 Conference Paper

VQ-FONT: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization

Mingshuai Yao
Yabo Zhang
Xianhui Lin
Xiaoming Li
Wangmeng Zuo

Few-shot font generation is challenging, as it needs to capture the fine-grained stroke styles from a limited set of reference glyphs, and then transfer to other characters, which are expected to have similar styles. However, due to the diversity and complexity of Chinese font styles, the synthesized glyphs of existing methods usually exhibit visible artifacts, such as missing details and distorted strokes. In this paper, we propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement. Specifically, we pre-train a VQGAN to encapsulate font token prior within a code-book. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes. Furthermore, our VQ-Font leverages the inherent design of Chinese characters, where structure components such as radicals and character components are combined in specific arrangements, to recalibrate fine-grained styles based on references. This process improves the matching and fusion of styles at the structure level. Both modules collaborate to enhance the fidelity of the generated fonts. Experiments on a collected font dataset show that our VQ-Font outperforms the competing methods both quantitatively and qualitatively, especially in generating challenging styles. Our code is available at https://github.com/Yaomingshuai/VQ-Font.

PDF Details DOI