Author name cluster

Yuxi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

AAAI Conference 2026 Conference Paper

4D Point Cloud Segmentation via Active Test-Time Adaptation

Mingrong Gong
Chaoqi Chen
Luyao Tang
Yuxi Wang
Sergio Escalera

4D point cloud segmentation is crucial for autonomous driving with continuous LiDAR streams. While test-time adaptation (TTA) is the standard approach for handling dynamic environments, current methods suffer from catastrophic error accumulation due to over-reliance on pseudo-labels. Active learning could provide reliable annotations for critical samples, but combining it with TTA faces severe challenges: realtime processing requirements and expensive 3D labeling costs. In this paper, we propose ATTA-4DSeg, the first framework to achieve efficient active test-time adaptation for 4D point cloud segmentation under extreme budget constraints. Our key insight is a self-reinforcing loop: oracle annotations refine adaptation prototypes, which then guide the selection of subsequent high-value samples from regions with severe distribution shifts, maximizing each annotation’s impact. Specifically, we propose three key innovations: (1) dual-prototype comparison that precisely localizes distribution shift boundaries to narrow annotation scope, (2) Class-Inverse Budget Allocation (CIBA) ensuring balanced adaptation across all categories, coupled with hybrid uncertainty scoring combining voxel-level geometry and point-wise variance for optimal sample selection, and (3) a refinement strategy leveraging sparse oracle annotations to improve predictions on unlabeled points, maximizing annotation utility. Extensive experiments show ATTA-4DSeg improves mIoU by 18.87%, 19.92%, and 3.6% on three domain adaptation benchmarks using only 1% annotation budget. Our method operates 2.28× faster than state-of-the-art methods. Remarkably, our approach reaches 90% of fully-supervised performance using only 5% annotation budget.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

GenColor: Generative and Expressive Color Enhancement with Pixel-Perfect Texture Preservation

Yi Dong
Yuxi Wang
Xianhui Lin
Wenqi Ouyang
Zhiqi Shen
Peiran Ren
Ruoxi Fan
Rynson Lau

Color enhancement is a crucial yet challenging task in digital photography. It demands methods that are (i) expressive enough for fine-grained adjustments, (ii) adaptable to diverse inputs, and (iii) able to preserve texture. Existing approaches typically fall short in at least one of these aspects, yielding unsatisfactory results. We propose GenColor, a novel diffusion-based framework for sophisticated, texture-preserving color enhancement. GenColor reframes the task as conditional image generation. Leveraging ControlNet and a tailored training scheme, it learns advanced color transformations that adapt to diverse lighting and content. We train GenColor on ARTISAN, our newly collected large-scale dataset of 1. 2M high-quality photographs specifically curated for enhancement tasks. To overcome texture preservation limitations inherent in diffusion models, we introduce a color-transfer network with a novel degradation scheme that simulates texture–color relationships. This network achieves pixel-perfect texture preservation while enabling fine-grained color matching with the diffusion-generated reference images. Extensive experiments show that GenColor produces visually compelling results comparable to those of expert colorists and surpasses state-of-the-art methods in both subjective and objective evaluations. We have released the code and dataset.

PDF Details

AAAI Conference 2025 Conference Paper

SceneX: Procedural Controllable Large-Scale Scene Generation

Mengqi Zhou
Yuxi Wang
Jun Hou
Shougao Zhang
Yiwei Li
Chuanchen Luo
Junran Peng
Zhaoxiang Zhang

Developing comprehensive explicit world models is crucial for understanding and simulating real-world scenarios. Recently, Procedural Controllable Generation (PCG) has gained significant attention in large-scale scene generation by enabling the creation of scalable, high-quality assets. However, PCG faces challenges such as limited modular diversity, high expertise requirements, and challenges in managing the diverse elements and structures in complex scenes. In this paper, we introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions. Specifically, the proposed method comprises two components, PCGHub and PCGPlanner. The former encompasses an extensive collection of accessible procedural assets and thousands of hand-craft API documents to perform as a standard protocol for PCG controller. The latter aims to generate executable actions for Blender to produce controllable and precise 3D assets guided by the user's instructions. Extensive experiments demonstrated the capability of our method in controllable large-scale scene generation, including nature scenes and unbounded cities, as well as scene editing such as asset placement and season translation.

PDF Details DOI

AAAI Conference 2024 Conference Paper

ChromaFusionNet (CFNet): Natural Fusion of Fine-Grained Color Editing

Yi Dong
Yuxi Wang
Ruoxi Fan
Wenqi Ouyang
Zhiqi Shen
Peiran Ren
Xuansong Xie

Digital image enhancement aims to deliver visually striking, pleasing images that align with human perception. While global techniques can elevate the image's overall aesthetics, fine-grained color enhancement can further boost visual appeal and expressiveness. However, colorists frequently face challenges in achieving accurate, localized color adjustments. Direct composition of these local edits can result in spatial color inconsistencies. Existing methods, including color style transfer and image harmonization, exhibit inconsistencies, especially at boundary regions. Addressing this, we present ChromaFusionNet (CFNet), a novel approach that views the color fusion problem through the lens of image color inpainting. Built on the Vision Transformer architecture, CFNet captures global context and delivers high-fidelity outputs, seamlessly blending colors while preserving boundary integrity. Empirical studies on ImageNet and COCO datasets demonstrate CFNet's superiority over existing methods in maintaining color harmony and color fidelity. Robustness evaluations and user studies have further validated the effectiveness of CFNet. In conclusion, CFNet introduces an innovative approach to seamless, fine-grained color fusion, paving the way for advancements in the domain of fine-grained color editing. Code and pretrained models are available at our project page: https://yidong.pro/projects/cfnet.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions

Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tong Wang
ZHAO-XIANG ZHANG

As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident. To address this, we present DropPos, a novel pretext task designed to reconstruct Dropped Positions. The formulation of DropPos is simple: we first drop a large random subset of positional embeddings and then the model classifies the actual position for each non-overlapping patch among all possible positions solely based on their visual appearance. To avoid trivial solutions, we increase the difficulty of this task by keeping only a subset of patches visible. Additionally, considering there may be different patches with similar visual appearances, we propose position smoothing and attentive reconstruction strategies to relax this classification problem, since it is not necessary to reconstruct their exact positions in these cases. Empirical evaluations of DropPos show strong capabilities. DropPos outperforms supervised pre-training and achieves competitive results compared with state-of-the-art self-supervised alternatives on a wide range of downstream benchmarks. This suggests that explicitly encouraging spatial reasoning abilities, as DropPos does, indeed contributes to the improved location awareness of ViTs. The code is publicly available at https: //github. com/Haochen-Wang409/DropPos.

PDF Details

ECAI Conference 2023 Conference Paper

WeaGAN: Weather-Aware Graph Attention Network for Traffic Prediction

Yuxi Wang
Yuan Luo

In recent years, traffic conditions in centralised cities have become more severe. To optimise public resources and reduce congestion, transportation departments rely on traffic prediction. However, unexpected events, for instance, rainfall can impact traffic conditions, which necessitates the introduction of weather elements to improve prediction results. Moreover, most of the existing works characterise the relationship between weather and traffic by simply combining these two issues together. Without carefully designing a structure that captures the inter-dependency between weather and traffic data, it is impossible to produce accurate predictions within a reasonable computational time. To address this issue, we propose a Weather-Aware Graph Attention Network (WeaGAN) that adapts an encoder-decoder architecture with weather attention mechanisms and a gate to model the complex spatial-temporal inter-dependency between weather and traffic adaptively. We further design a self-attention mechanism to improve prediction accuracy. Our experiments on a standard real-world dataset show that, compared to the state-of-the-art, WeaGAN: (i) can improve the prediction accuracy by up to 29%; and (ii) is efficient in terms of saving up to 61% of the computation time.

Details