Author name cluster

Hongyu Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

Hybrid-Domain Adaptative Representation Learning for Gaze Estimation

Qida Tan
Hongyu Yang
Wenchao Du

Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significant performance degradation in cross-domain evaluation due to interference from gaze-irrelevant factors, such as expressions, wearables, and image quality. To alleviate this problem, we present a novel Hybrid-domain Adaptative Representation Learning (shorted by HARL) framework that exploits multi-source hybrid datasets to learn robust gaze representation. More specifically, we propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images in an unsupervised domain-adaptation manner, which hardly requires any computational or inference costs. Additionally, we analyze the effect of head-pose and design a simple yet efficient sparse graph fusion module to explore the geometric constraint between gaze direction and head-pose, leading to a dense and robust gaze representation. Extensive experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy of 5.02, 3.36, and 9.26 degrees respectively, and present competitive performances through cross-dataset evaluation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SceneGenesis: 3D Scene Synthesis via Semantic Structural Priors and Mesh-Guided Video-Geometry Fusion

Yueming Zhao
Hongyu Yang
Di Huang

Generating high-quality, controllable, and structurally consistent 3D scenes in complex multi-object environments remains a fundamental challenge. We present SceneGenesis, a unified framework that synthesizes 3D scenes by combining semantic structural priors with mesh-guided video–geometry fusion. SceneGenesis first employs large language models to convert textual descriptions into category-aware object specifications, which are transformed into structured meshes using procedural approximations and pretrained asset generators, enabling precise layout control and scalable scene construction. To obtain rich and style-controllable appearances, SceneGenesis generates multi-view video representations conditioned on the initialized structure. A mesh-guided video–geometry fusion module then consolidates video evidence with mesh priors through mesh-conditioned fragment initialization, progressive geometric refinement, and structure-aware optimization, substantially improving global geometric fidelity and visual realism. Experiments demonstrate that SceneGenesis supports flexible style variation and object-level editing while achieving strong controllability, scalability, and structural quality.

PDF Details DOI

AAAI Conference 2025 Conference Paper

3D²-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling

Zichen Tang
Hongyu Yang
Hanchen Zhang
Jiaxin Chen
Di Huang

Advancements in neural implicit representations and differentiable rendering have markedly improved the ability to learn animatable 3D avatars from sparse multi-view RGB videos. However, current methods that map observation space to canonical space often face challenges in capturing pose-dependent details and generalizing to novel poses. While diffusion models have demonstrated remarkable zero-shot capabilities in 2D image generation, their potential for creating animatable 3D avatars from 2D inputs remains underexplored. In this work, we introduce 3D²-Actor, a novel approach featuring a pose-conditioned 3D-aware human modeling pipeline that integrates iterative 2D denoising and 3D rectifying steps. The 2D denoiser, guided by pose cues, generates detailed multi-view images that provide the rich feature set necessary for high-fidelity 3D reconstruction and pose rendering. Complementing this, our Gaussian-based 3D rectifier renders images with enhanced 3D consistency through a two-stage projection strategy and a novel local coordinate representation. Additionally, we propose an innovative sampling strategy to ensure smooth temporal continuity across frames in video synthesis. Our method effectively addresses the limitations of traditional numerical solutions in handling ill-posed mappings, producing realistic and animatable 3D human avatars. Experimental results demonstrate that 3D²-Actor excels in high-fidelity avatar modeling and robustly generalizes to novel poses.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images

Yihui Li
Chengxin Lv
Hongyu Yang
Di Huang

3D reconstruction from unconstrained image collections presents substantial challenges due to varying appearances and transient occlusions. In this paper, we introduce Micro-macro Wavelet-based Gaussian Splatting (MW-GS), a novel approach designed to enhance 3D reconstruction by disentangling scene representations into global, refined, and intrinsic components. The proposed method features two key innovations: Micro-macro Projection, which allows Gaussian points to capture details from feature maps across multiple scales with enhanced diversity; and Wavelet-based Sampling, which leverages frequency domain information to refine feature representations and significantly improve the modeling of scene appearances. Additionally, we incorporate a Hierarchical Residual Fusion Network to seamlessly integrate these features. Extensive experiments demonstrate that MW-GS delivers state-of-the-art rendering performance, surpassing existing methods.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Learning Polysemantic Spoof Trace: A Multi-Modal Disentanglement Network for Face Anti-spoofing

Kaicheng Li
Hongyu Yang
Binghui Chen
Pengyu Li
Biao Wang
Di Huang

Along with the widespread use of face recognition systems, their vulnerability has become highlighted. While existing face anti-spoofing methods can be generalized between attack types, generic solutions are still challenging due to the diversity of spoof characteristics. Recently, the spoof trace disentanglement framework has shown great potential for coping with both seen and unseen spoof scenarios, but the performance is largely restricted by the single-modal input. This paper focuses on this issue and presents a multi-modal disentanglement model which targetedly learns polysemantic spoof traces for more accurate and robust generic attack detection. In particular, based on the adversarial learning mechanism, a two-stream disentangling network is designed to estimate spoof patterns from the RGB and depth inputs, respectively. In this case, it captures complementary spoofing clues inhering in different attacks. Furthermore, a fusion module is exploited, which recalibrates both representations at multiple stages to promote the disentanglement in each individual modality. It then performs cross-modality aggregation to deliver a more comprehensive spoof trace representation for prediction. Extensive evaluations are conducted on multiple benchmarks, demonstrating that learning polysemantic spoof traces favorably contributes to anti-spoofing with more perceptible and interpretable results.

PDF Details DOI

ICRA Conference 2022 Conference Paper

Depth Completion Using Geometry-Aware Embedding

Wenchao Du
Hu Chen 0002
Hongyu Yang
Yi Zhang 0018

Exploiting internal spatial geometric constraints of sparse LiDARs is beneficial to depth completion, however, has been not explored well. This paper proposes an efficient method to learn geometry-aware embedding, which encodes the local and global geometric structure information from 3D points, e. g. , scene layout, object's sizes and shapes, to guide dense depth estimation. Specifically, we utilize the dynamic graph representation to model generalized geometric relationship from irregular point clouds in a flexible and efficient manner. Further, we joint this embedding and corresponded RGB appearance information to infer missing depths of the scene with well structure-preserved details. The key to our method is to integrate implicit 3D geometric representation into a 2D learning architecture, which leads to a better trade-off between the performance and efficiency. Extensive experiments demonstrate that the proposed method outperforms previous works and could reconstruct fine depths with crisp boundaries in regions that are over-smoothed by them. The ablation study gives more insights into our method that could achieve significant gains with a simple design, while having better generalization capability and stability. The code is available at https://github.com/Wenchao-Du/GAENet.

Details

ICML Conference 2022 Conference Paper

DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow Forecasting

Shiyong Lan
Yitong Ma
Weikang Huang
Wenwu Wang 0001
Hongyu Yang
Pyang Li

As a typical problem in time series analysis, traffic flow prediction is one of the most important application fields of machine learning. However, achieving highly accurate traffic flow prediction is a challenging task, due to the presence of complex dynamic spatial-temporal dependencies within a road network. This paper proposes a novel Dynamic Spatial-Temporal Aware Graph Neural Network (DSTAGNN) to model the complex spatial-temporal interaction in road network. First, considering the fact that historical data carries intrinsic dynamic information about the spatial structure of road networks, we propose a new dynamic spatial-temporal aware graph based on a data-driven strategy to replace the pre-defined static graph usually used in traditional graph convolution. Second, we design a novel graph neural network architecture, which can not only represent dynamic spatial relevance among nodes with an improved multi-head attention mechanism, but also acquire the wide range of dynamic temporal dependency from multi-receptive field features via multi-scale gated convolution. Extensive experiments on real-world data sets demonstrate that our proposed method significantly outperforms the state-of-the-art methods.

Details

ICML Conference 2017 Conference Paper

Scalable Bayesian Rule Lists

Hongyu Yang
Cynthia Rudin
Margo I. Seltzer

We present an algorithm for building probabilistic rule lists that is two orders of magnitude faster than previous work. Rule list algorithms are competitors for decision tree algorithms. They are associative classifiers, in that they are built from pre-mined association rules. They have a logical structure that is a sequence of IF-THEN rules, identical to a decision list or one-sided decision tree. Instead of using greedy splitting and pruning like decision tree algorithms, we aim to fully optimize over rule lists, striking a practical balance between accuracy, interpretability, and computational speed. The algorithm presented here uses a mixture of theoretical bounds (tight enough to have practical implications as a screening or bounding procedure), computational reuse, and highly tuned language libraries to achieve computational efficiency. Currently, for many practical problems, this method achieves better accuracy and sparsity than decision trees. In many cases, the computational time is practical and often less than that of decision trees.

Details