Hongyang Yu Papers

AAAI Conference 2025 Conference Paper

Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation

Wenzhao Xiang
Chang Liu
Hongyang Yu
Xilin Chen

Masked Image Modeling (MIM) has garnered significant attention in self-supervised learning, thanks to its impressive capacity to learn scalable visual representations tailored for downstream tasks. However, images inherently contain abundant redundant information, leading the pixel-based MIM reconstruction process to focus excessively on finer details such as textures, thus prolonging training times unnecessarily. Addressing this challenge requires a shift towards a compact representation of features during MIM reconstruction. Frequency domain analysis provides a promising avenue for achieving compact image feature representation. In contrast to the commonly used Fourier transform, wavelet transform not only offers frequency information but also preserves spatial characteristics and multi-level features of the image. Additionally, the multi-level decomposition process of wavelet transformation aligns well with the hierarchical architecture of modern neural networks. In this study, we leverage wavelet transform as a tool for efficient representation learning to expedite the training process of MIM. Specifically, we conduct multi-level decomposition of images using wavelet transform, utilizing wavelet coefficients from different levels to construct distinct reconstruction targets representing various frequencies and scales. These reconstruction targets are then integrated into the MIM process, with adjustable weights assigned to prioritize the most crucial information. Extensive experiments demonstrate that our method achieves comparable or superior performance across various downstream tasks while exhibiting higher training efficiency.

PDF Details DOI

TMLR Journal 2024 Journal Article

TensorVAE: a simple and efficient generative model for conditional molecular conformation generation

Hongyang Yu
Hongjiang Yu

Efficient generation of 3D conformations of a molecule from its 2D graph is a key challenge in in-silico drug discovery. Deep learning (DL) based generative modelling has recently become a potent tool to tackling this challenge. However, many existing DL-based methods are either indirect–leveraging inter-atomic distances or direct–but requiring numerous sampling steps to generate conformations. In this work, we propose a simple model abbreviated TensorVAE capable of generating conformations directly from a 2D molecular graph in a single step. The main novelty of the proposed method is focused on feature engineering. We develop a novel encoding and feature extraction mechanism relying solely on standard convolution operation to generate token-like feature vector for each atom. These feature vectors are then transformed through standard transformer encoders under a conditional Variational Autoencoder framework for generating conformations directly. We show through experiments on two benchmark datasets that with intuitive feature engineering, a relatively simple and standard model can provide promising generative capability outperforming more than a dozen state-of-the-art models employing more sophisticated and specialized generative architecture.

PDF Details

Possible papers

Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation

TensorVAE: a simple and efficient generative model for conditional molecular conformation generation