Author name cluster

Yu-Kun Lai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

InterCoser: Interactive 3D Character Creation with Disentangled Fine-Grained Features

Yi Wang
Jian Ma
Zhuo Su
Guidong Wang
Jingyu Yang
Yu-Kun Lai
Kun Li

This paper aims to interactively generate and edit disentangled 3D characters based on precise user instructions. Existing methods generate and edit 3D characters via rough and simple editing guidance and entangled representations, making it difficult to achieve precise and comprehensive control over fine-grained local editing and free clothing transfer for characters. To enable accurate and intuitive control over the generation and editing of high-quality 3D characters with freely interchangeable clothing, we propose a novel user-interactive approach for disentangled 3D character creation. Specifically, to achieve precise control over 3D character generation and editing, we introduce two user-friendly interaction approaches: a sketch-based layered character generation/editing method, which supports clothing transfer; and a 3D-proxy-based part-level editing method, enabling fine-grained disentangled editing. To enhance 3D character quality, we propose a 3D Gaussian reconstruction strategy guided by geometric priors, ensuring that 3D characters exhibit detailed local geometry and smooth global surfaces. Extensive experiments on both public datasets and in-the-wild data demonstrate that our approach not only generates high-quality disentangled 3D characters but also supports precise and fine-grained editing through user interaction.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment

Yixiao Li
Xiaoyuan Yang
Weide Liu
Xin Jin
Xu Jia
Yu-Kun Lai
Paul L. Rosin
Hantao Liu

As super-resolution (SR) techniques introduce unique distortions that fundamentally differ from those caused by traditional degradation processes (e.g., compression), there is an increasing demand for specialized video quality assessment (VQA) methods tailored to SR-generated content. One critical factor affecting perceived quality is temporal inconsistency, which refers to irregularities between consecutive frames. However, existing VQA approaches rarely quantify this phenomenon or explicitly investigate its relationship with human perception. Moreover, SR videos exhibit amplified inconsistency levels as a result of enhancement processes. In this paper, we propose Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment (TIG-SVQA) that underscores the critical role of temporal inconsistency in guiding the quality assessment of SR videos. We first design a perception-oriented approach to quantify frame-wise temporal inconsistency. Based on this, we introduce the Inconsistency Highlighted Spatial Module, which localizes inconsistent regions at both coarse and fine scales. Inspired by the human visual system, we further develop an Inconsistency Guided Temporal Module that performs progressive temporal feature aggregation: (1) a consistency-aware fusion stage in which a visual memory capacity block adaptively determines the information load of each temporal segment based on inconsistency levels, and (2) an informative filtering stage for emphasizing quality-related features. Extensive experiments on both single-frame and multi-frame SR video scenarios demonstrate that our method significantly outperforms state-of-the-art VQA approaches.

PDF Details DOI

IROS Conference 2025 Conference Paper

Celebi's Choice: Causality-Guided Skill Optimisation for Granular Manipulation via Differentiable Simulation

Minglun Wei
Xintong Yang
Junyu Yan
Yu-Kun Lai
Ze Ji

Robotic soil manipulation is essential for automated farming, particularly in excavation and levelling tasks. However, the nonlinear dynamics of granular materials challenge traditional control methods, limiting stability and efficiency. We propose Celebi, a causality-enhanced optimisation method that integrates differentiable physics simulation with adaptive step-size adjustments based on causal inference. To enable gradient-based optimisation, we construct a differentiable simulation environment for granular material interactions. We further define skill parameters with a differentiable mapping to end-effector motions, facilitating efficient trajectory optimisation. By modelling causal effects between task-relevant features extracted from point cloud observations and skill parameters, Celebi selectively adjusts update step sizes to enhance optimisation stability and convergence efficiency. Experiments in both simulated and real-world environments validate Celebi’s effectiveness, demonstrating robust and reliable performance in robotic excavation and levelling tasks.

Details

IROS Conference 2025 Conference Paper

Skeleton-Guided Rolling-Contact Kinematics for Arbitrary Point Clouds via Locally Controllable Parameterized Curve Fitting

Qingmeng Wen
Ze Ji
Yu-Kun Lai
Mikhail M. Svinin
Seyed Amir Tafrishi

Rolling contact kinematics plays a vital role in dexterous manipulation and rolling-based locomotion. Yet, in practical applications, the environments and objects involved are often captured as discrete point clouds, creating substantial difficulties for traditional motion control and planning frameworks that rely on continuous surface representations. In this work, we propose a differential geometry-based framework that models point cloud data for continuous rolling contact using locally parameterized representations. Our approach leverages skeletonization to define a rotational reference structure for rolling interactions and applies a Fourier-based curve fitting technique to extract and represent meaningful controllable local geometric structure. We further introduce a novel 2D manifold coordinate system tailored to arbitrary surface curves, enabling local parameterization of complex shapes. The governing kinematic equations for rolling contact are then derived, and we demonstrate the effectiveness of our method through simulations on various object examples.

Details

ICML Conference 2024 Conference Paper

Efficient Precision and Recall Metrics for Assessing Generative Models using Hubness-aware Sampling

Yuanbang Liang
Jing Wu 0004
Yu-Kun Lai
Yipeng Qin

Despite impressive results, deep generative models require massive datasets for training, and as dataset size increases, effective evaluation metrics like precision and recall (P&R) become computationally infeasible on commodity hardware. In this paper, we address this challenge by proposing efficient P&R (eP&R) metrics that give almost identical results as the original P&R but with much lower computational costs. Specifically, we identify two redundancies in the original P&R: i) redundancy in ratio computation and ii) redundancy in manifold inside/outside identification. We find both can be effectively removed via hubness-aware sampling, which extracts representative elements from synthetic/real image samples based on their hubness values, i. e. , the number of times a sample becomes a k-nearest neighbor to others in the feature space. Thanks to the insensitivity of hubness-aware sampling to exact k-nearest neighbor (k-NN) results, we further improve the efficiency of our eP&R metrics by using approximate k-NN methods. Extensive experiments show that our eP&R matches the original P&R but is far more efficient in time and space. Our code is available at: https: //github. com/Byronliang8/Hubness_Precision_Recall

Details

IJCAI Conference 2024 Conference Paper

SceneDiff: Generative Scene-Level Image Retrieval with Text and Sketch Using Diffusion Models

Ran Zuo
Haoxiang Hu
Xiaoming Deng
Cangjun Gao
Zhengming Zhang
Yu-Kun Lai
Cuixia Ma
Yong-Jin Liu

Jointly using text and sketch for scene-level image retrieval utilizes the complementary between text and sketch to describe the fine-grained scene content and retrieve the target image, which plays a pivotal role in accurate image retrieval. Existing methods directly fuse the features of sketch and text and thus suffer from the bottleneck of limited utilization for crucial semantic and structural information, leading to inaccurate matching with images. In this paper, we propose SceneDiff, a novel retrieval network that leverages a pre-trained diffusion model to establish a shared generative latent space, enabling a joint latent representation learning for both sketch and text features and precise alignment with the corresponding image. Specifically, we encode text, sketch and image features, and project them into the diffusion-based share space, conditioning the denoising process on sketch and text features to generate latent fusion features, while employing the pre-trained autoencoder for latent image features. Within this space, we introduce the content-aware feature transformation module to reconcile encoded sketch and image features with the diffusion latent space's dimensional requirements and preserve their visual content information. Then we augment the representation capability of the generated latent fusion features by integrating multiple samplings with partition attention, and utilize contrastive learning to align both direct fusion features and generated latent fusion features with corresponding image representations. Our method outperforms the state-of-the-art works through extensive experiments, providing a novel insight into the related retrieval field.

PDF Details DOI

AAAI Conference 2023 Conference Paper

FEditNet: Few-Shot Editing of Latent Semantics in GAN Spaces

Mengfei Xia
Yezhi Shu
Yuji Wang
Yu-Kun Lai
Qiang Li
Pengfei Wan
Zhongyuan Wang
Yong-Jin Liu

Generative Adversarial networks (GANs) have demonstrated their powerful capability of synthesizing high-resolution images, and great efforts have been made to interpret the semantics in the latent spaces of GANs. However, existing works still have the following limitations: (1) the majority of works rely on either pretrained attribute predictors or large-scale labeled datasets, which are difficult to collect in most cases, and (2) some other methods are only suitable for restricted cases, such as focusing on interpretation of human facial images using prior facial semantics. In this paper, we propose a GAN-based method called FEditNet, aiming to discover latent semantics using very few labeled data without any pretrained predictors or prior knowledge. Specifically, we reuse the knowledge from the pretrained GANs, and by doing so, avoid overfitting during the few-shot training of FEditNet. Moreover, our layer-wise objectives which take content consistency into account also ensure the disentanglement between attributes. Qualitative and quantitative results demonstrate that our method outperforms the state-of-the-art methods on various datasets. The code is available at https://github.com/THU-LYJ-Lab/FEditNet.

PDF Details DOI

ICML Conference 2023 Conference Paper

NeuralSlice: Neural 3D Triangle Mesh Reconstruction via Slicing 4D Tetrahedral Meshes

Chenbo Jiang
Jie Yang 0038
Shwai He
Yu-Kun Lai
Lin Gao 0004

Learning-based high-fidelity reconstruction of 3D shapes with varying topology is a fundamental problem in computer vision and computer graphics. Recent advances in learning 3D shapes using explicit and implicit representations have achieved impressive results in 3D modeling. However, the template-based explicit representation is limited by fixed topology, and the implicit representation, although flexible with arbitrary topology, requires a large number of sampled points to regress the surface, which is computationally expensive. In this work, we propose a novel 3D shape representation named NeuralSlice, which represents a 3D shape as the intersection of a 4D tetrahedral mesh and a 4D hyperplane. A novel network is designed to incorporate the proposed representation flexibly, which learns a deformable 4D template and a parameter for slicing 4D hyperplane to reconstruct the 3D object. To learn the local deformation of the 4D template, we further propose a spatial-aware network to locate the 4D points within the 3D feature volume of input shape via positional encoding, which leverages the local geometrical feature to guide the 4D deformation. By addressing the 3D problem in a higher 4D space, our method supports flexible topology changes while being highly efficient. Our method is guaranteed to produce manifold meshes. NeuralSlice outperforms the state-of-the-art explicit-based approaches in terms of reconstruction quality. Compared with implicit approaches, by avoiding point sampling, our method is 10 times faster than the implicit approaches, and better preserves thin structures. NeuralSlice has the capability of representing various shapes and topologies using a single 4D tetrahedral mesh. The corresponding code can be found on GitHub at https: //github. com/IGLICT/NEURALSLICE

Details

ICML Conference 2022 Conference Paper

Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling

Yuanbang Liang
Jing Wu 0004
Yu-Kun Lai
Yipeng Qin

Despite the extensive studies on Generative Adversarial Networks (GANs), how to reliably sample high-quality images from their latent spaces remains an under-explored topic. In this paper, we propose a novel GAN latent sampling method by exploring and exploiting the hubness priors of GAN latent distributions. Our key insight is that the high dimensionality of the GAN latent space will inevitably lead to the emergence of hub latents that usually have much larger sampling densities than other latents in the latent space. As a result, these hub latents are better trained and thus contribute more to the synthesis of high-quality images. Unlike the a posterior "cherry-picking", our method is highly efficient as it is an a priori method that identifies high-quality latents before the synthesis of images. Furthermore, we show that the well-known but purely empirical truncation trick is a naive approximation to the central clustering effect of hub latents, which not only uncovers the rationale of the truncation trick, but also indicates the superiority and fundamentality of our method. Extensive experimental results demonstrate the effectiveness of the proposed method. Our code is available at: https: //github. com/Byronliang8/HubnessGANSampling.

Details

NeurIPS Conference 2022 Conference Paper

FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction

Qiao Feng
Yebin Liu
Yu-Kun Lai
Jingyu Yang
Kun Li

The advent of deep learning has led to significant progress in monocular human reconstruction. However, existing representations, such as parametric models, voxel grids, meshes and implicit neural representations, have difficulties achieving high-quality results and real-time speed at the same time. In this paper, we propose Fourier Occupancy Field (FOF), a novel, powerful, efficient and flexible 3D geometry representation, for monocular real-time and accurate human reconstruction. A FOF represents a 3D object with a 2D field orthogonal to the view direction where at each 2D position the occupancy field of the object along the view direction is compactly represented with the first few terms of Fourier series, which retains the topology and neighborhood relation in the 2D domain. A FOF can be stored as a multi-channel image, which is compatible with 2D convolutional neural networks and can bridge the gap between 3D geometries and 2D images. A FOF is very flexible and extensible, \eg, parametric models can be easily integrated into a FOF as a prior to generate more robust results. Meshes and our FOF can be easily inter-converted. Based on FOF, we design the first 30+FPS high-fidelity real-time monocular human reconstruction framework. We demonstrate the potential of FOF on both public datasets and real captured data. The code is available for research purposes at http: //cic. tju. edu. cn/faculty/likun/projects/FOF.

PDF Details

ICRA Conference 2019 Conference Paper

Probabilistic Projective Association and Semantic Guided Relocalization for Dense Reconstruction

Sheng Yang 0007
Zheng-Fei Kuang
Yan-Pei Cao
Yu-Kun Lai
Shi-Min Hu 0001

We present a real-time dense mapping system which uses the predicted 2D semantic labels for optimizing the geometric quality of reconstruction. With a combination of Convolutional Neural Networks (CNNs) for 2D labeling and a Simultaneous Localization and Mapping (SLAM) system for camera trajectory estimation, recent approaches have succeeded in incrementally fusing and labeling 3D scenes. However, the geometric quality of the reconstruction can be further improved by incorporating such semantic prediction results, which is not sufficiently exploited by existing methods. In this paper, we propose to use semantic information to improve two crucial modules in the reconstruction pipeline, namely tracking and loop detection, for obtaining mutual benefits in geometric reconstruction and semantic recognition. Specifically for tracking, we use a novel probabilistic projective association approach to efficiently pick out candidate correspondences, where the confidence of these correspondences is quantified concerning similarities on all available short-term invariant features. For the loop detection, we incorporate these semantic labels into the original encoding through Randomized Ferns to generate a more comprehensive representation for retrieving candidate loop frames. Evaluations on a publicly available synthetic dataset have shown the effectiveness of our approach that considers such semantic hints as a reliable feature for achieving higher geometric quality.

Details

AAAI Conference 2018 Conference Paper

Mesh-Based Autoencoders for Localized Deformation Component Analysis

Qingyang Tan
Lin Gao
Yu-Kun Lai
Jie Yang
Shihong Xia

Spatially localized deformation components are very useful for shape analysis and synthesis in 3D geometry processing. Several methods have recently been developed, with an aim to extract intuitive and interpretable deformation components. However, these techniques suffer from fundamental limitations especially for meshes with noise or large-scale deformations, and may not always be able to identify important deformation components. In this paper we propose a novel mesh-based autoencoder architecture that is able to cope with meshes with irregular topology. We introduce sparse regularization in this framework, which along with convolutional operations, helps localize deformations. Our framework is capable of extracting localized deformation components from mesh data sets with large-scale deformations and is robust to noise. It also provides a nonlinear approach to reconstruction of meshes using the extracted basis, which is more effective than the current linear combination approach. Extensive experiments show that our method outperforms state-of-the-art methods in both qualitative and quantitative evaluations.

PDF Details

AAAI Conference 2018 Conference Paper

Retrieving and Classifying Affective Images via Deep Metric Learning

Jufeng Yang
Dongyu She
Yu-Kun Lai
Ming-Hsuan Yang

Affective image understanding has been extensively studied in the last decade since more and more users express emotion via visual contents. While current algorithms based on convolutional neural networks aim to distinguish emotional categories in a discrete label space, the task is inherently ambiguous. This is mainly because emotional labels with the same polarity (i. e. , positive or negative) are highly related, which is different from concrete object concepts such as cat, dog and bird. To the best of our knowledge, few methods focus on leveraging such characteristic of emotions for affective image understanding. In this work, we address the problem of understanding affective images via deep metric learning and propose a multi-task deep framework to optimize both retrieval and classiﬁcation goals. We propose the sentiment constraints adapted from the triplet constraints, which are able to explore the hierarchical relation of emotion labels. We further exploit the sentiment vector as an effective representation to distinguish affective images utilizing the texture representation derived from convolutional layers. Extensive evaluations on four widely-used affective datasets, i. e. , Flickr and Instagram, IAPSa, Art Photo, and Abstract Paintings, demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods on both affective image retrieval and classiﬁcation tasks.

PDF Details