Arrow Research search

Author name cluster

Han Huang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

TIST Journal 2026 Journal Article

Microscale-Searching Optimization for Transfer Learning-Based Filter Fine-Tuning

  • Le Feng
  • Fujian Feng
  • Li Xiao
  • Mian Tan
  • Han Huang
  • Yinong Wang

Fine-tuning has emerged as a popular technique in the field of transfer learning, demonstrating remarkable achievements in various data-scarce tasks. The performance of fine-tuning in deep convolutional neural networks depends on the selection of which parameters to fine-tune and freeze. However, it is difficult to determine which parameters in the pre-trained model need to be fine-tuned for a new task. This article proposes a filter-level discrete optimization model to identify the filter subset for fine-tuning, a core step of filter selection coding optimization. Due to the huge search space of the filter fine-tuning problem, we propose a filter interactivity decomposition strategy to find a valid search subspace (a smaller search subspace containing the optimal solution) by dividing the entire filter fine-tuning problem into multiple suboptimization problems. Based on the decomposition strategy, we design a microscale-searching transfer optimization algorithm, which solves each subproblem by searching the valid search subspace instead of the original search space of the filter fine-tuning problem. To verify the validity of the proposed algorithm, extensive experiments are conducted on seven publicly available image classification datasets: Stanford Dogs, MIT Indoors, Caltech 256-30, Caltech 256-60, Aircraft, UCF-101, and Omniglot. Experimental results show that the proposed method significantly improves the fine-tuning accuracy while effectively reducing the filter fine-tuning problem scale. Moreover, the proposed algorithm outperforms the state-of-the-art fine-tuning methods on the fine-tuning problem for transfer learning.

NeurIPS Conference 2025 Conference Paper

EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation

  • Chao Song
  • Zhiyuan Liu
  • Han Huang
  • Liang Wang
  • Qiong Wang
  • Jian-Yu Shi
  • Hui Yu
  • Yihang Zhou

Designing enzyme backbones with substrate-specific functionality is a critical challenge in computational protein engineering. Current generative models excel in protein design but face limitations in binding data, substrate-specific control, and flexibility for de novo enzyme backbone generation. To address this, we introduce EnzyBind, a dataset with 11, 100 experimentally validated enzyme-substrate pairs specifically curated from PDBbind. Building on this, we propose EnzyControl, a method that enables functional and substrate-specific control in enzyme backbone generation. Our approach generates enzyme backbones conditioned on MSA-annotated catalytic sites and their corresponding substrates, which are automatically extracted from curated enzyme-substrate data. At the core of EnzyControl is EnzyAdapter, a lightweight, modular component integrated into a pretrained motif-scaffolding model, allowing it to become substrate-aware. A two-stage training paradigm further refines the model's ability to generate accurate and functional enzyme structures. Experiments show that our EnzyControl achieves the best performance across structural and functional metrics on EnzyBind and EnzyBench benchmarks, with particularly notable improvements of 13% in designability and 13% in catalytic efficiency compared to the baseline models. The code is released at https: //github. com/Vecteur-libre/EnzyControl.

ICLR Conference 2025 Conference Paper

Exploring the Design Space of Visual Context Representation in Video MLLMs

  • Yifan Du 0002
  • Yuqi Huo
  • Kun Zhou 0002
  • Zijia Zhao
  • Haoyu Lu
  • Han Huang
  • Xin Zhao 0018
  • Bingning Wang

Video Multimodal Large Language Models~(MLLMs) have shown remarkable capability of understanding the video semantics on various downstream tasks. Despite the advancements, there is still a lack of systematic research on visual context representation, which refers to the scheme to select frames from a video and further select the tokens from a frame. In this paper, we explore the design space for visual context representation, and aim to improve the performance of video MLLMs by finding more effective representation schemes. Firstly, we formulate the task of visual context representation as a constrained optimization problem, and model the language modeling loss as a function of the number of frames and the number of embeddings (or tokens) per frame, given the maximum visual context window size. Then, we explore the scaling effects in frame selection and token selection respectively, and fit the corresponding function curve by conducting extensive empirical experiments. We examine the effectiveness of typical selection strategies and present empirical findings to determine the two factors. Furthermore, we study the joint effect of frame selection and token selection, and derive the optimal formula for determining the two factors. We demonstrate that the derived optimal settings show alignment with the best-performed results of empirical experiments. The data and code are available at: https://github.com/RUCAIBox/Opt-Visor.

AAAI Conference 2025 Conference Paper

FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency

  • Han Huang
  • Yulun Wu
  • Chao Deng
  • Ge Gao
  • Ming Gu
  • Yu-Shen Liu

Recently, Gaussian Splatting has sparked a new trend in the field of computer vision. Apart from novel view synthesis, it has also been extended to the area of multi-view reconstruction. The latest methods facilitate complete, detailed surface reconstruction while ensuring fast training speed. However, these methods still require dense input views, and their output quality significantly degrades with sparse views. We observed that the Gaussian primitives tend to overfit the few training views, leading to noisy floaters and incomplete reconstruction surfaces. In this paper, we present an innovative sparse-view reconstruction framework that leverages intra-view depth and multi-view feature consistency to achieve remarkably accurate surface reconstruction. Specifically, we utilize monocular depth ranking information to supervise the consistency of depth distribution within patches and employ a smoothness loss to enhance the continuity of the distribution. To achieve finer surface reconstruction, we optimize the absolute position of depth through multi-view projection features. Extensive experiments on DTU and BlendedMVS demonstrate that our method outperforms state-of-the-art methods with a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction without the need for costly pre-training.

ICLR Conference 2025 Conference Paper

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

  • Zhiyuan Liu 0001
  • Yanchen Luo
  • Han Huang
  • Enzhi Zhang
  • Sihang Li 0002
  • Junfeng Fang
  • Yaorui Shi
  • Xiang Wang 0010

3D molecule generation is crucial for drug discovery and material design. While prior efforts focus on 3D diffusion models for their benefits in modeling continuous 3D conformers, they overlook the advantages of 1D SELFIES-based Language Models (LMs), which can generate 100\% valid molecules and leverage the billion-scale 1D molecule datasets. To combine these advantages for 3D molecule generation, we propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation. NExT-Mol uses an extensively pretrained molecule LM for 1D molecule generation, and subsequently predicts the generated molecule's 3D conformers with a 3D diffusion model. We enhance NExT-Mol's performance by scaling up the LM's model size, refining the diffusion neural architecture, and applying 1D to 3D transfer learning. Notably, our 1D molecule LM significantly outperforms baselines in distributional similarity while ensuring validity, and our 3D diffusion model achieves leading performances in conformer prediction. Given these improvements in 1D and 3D modeling, NExT-Mol achieves a 26\% relative improvement in 3D FCD for de novo 3D generation on GEOM-DRUGS, and a 13\% average relative gain for conditional 3D generation on QM9-2014. Our codes and pretrained checkpoints are available at https://github.com/acharkq/NExT-Mol.

AAAI Conference 2025 Conference Paper

Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

  • Yulun Wu
  • Han Huang
  • Wenyuan Zhang
  • Chao Deng
  • Ge Gao
  • Ming Gu
  • Yu-Shen Liu

In recent years, reconstructing indoor scene geometry from multi-view images has achieved encouraging accomplishments. Current methods incorporate monocular priors into neural implicit surface models to achieve high-quality reconstructions. However, these methods require hundreds of images for scene reconstruction. When only a limited number of views are available as input, the performance of monocular priors deteriorates due to scale ambiguity, leading to the collapse of the reconstructed scene geometry. In this paper, we propose a new method, named Sparis, for indoor surface reconstruction from sparse views. Specifically, we investigate the impact of monocular priors on sparse scene reconstruction, introducing a novel prior based on inter-image matching information. Our prior offers more accurate depth information while ensuring cross-view matching consistency. Additionally, we employ an angular filter strategy and an epipolar matching weight function, aiming to reduce errors due to view matching inaccuracies, thereby refining the inter-image prior for improved reconstruction accuracy. The experiments conducted on widely used benchmarks demonstrate superior performance in sparse-view scene reconstruction.

NeurIPS Conference 2024 Conference Paper

Low Degree Hardness for Broadcasting on Trees

  • Han Huang
  • Elchanan Mossel

We study the low-degree hardness of broadcasting on trees. Broadcasting on trees has been extensively studied in statistical physics, in computational biology in relation to phylogenetic reconstruction and in statistics and computer science in the context of block model inference, and as a simple data model for algorithms that may require depth for inference. The inference of the root can be carried by celebrated Belief Propagation (BP) algorithm which achieves Bayes-optimal performance. Despite the fact that this algorithm runs in linear time (using real operations), recent works indicated that this algorithm in fact requires high level of complexity. Moitra, Mossel and Sandon constructed a chain for which estimating the root better than random (for a typical input) is $NC1$ complete. Kohler and Mossel constructed chains such that for trees with $N$ leaves, recovering the root better than random requires a polynomial of degree $N^{\Omega(1)}$. Both works above asked if such complexity bounds hold in general below the celebrated {\em Kesten-Stigum} bound. In this work, we prove that this is indeed the case for low degree polynomials. We show that for the broadcast problem using any Markov chain on trees with $N$ leaves, below the Kesten Stigum bound, any $O(\log N)$ degree polynomial has vanishing correlation with the root. Our result is one of the first low-degree lower bound that is proved in a setting that is not based or easily reduced to a product measure.

AAAI Conference 2024 Conference Paper

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views

  • Han Huang
  • Yulun Wu
  • Junsheng Zhou
  • Ge Gao
  • Ming Gu
  • Yu-Shen Liu

Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer from high training costs and are merely valid under carefully selected perspectives. In this paper, we propose a novel sparse view reconstruction framework that leverages on-surface priors to achieve highly faithful surface reconstruction. Specifically, we design several constraints on global geometry alignment and local geometry refinement for jointly optimizing coarse shapes and fine details. To achieve this, we train a neural network to learn a global implicit field from the on-surface points obtained from SfM and then leverage it as a coarse geometric constraint. To exploit local geometric consistency, we project on-surface points onto seen and unseen views, treating the consistent loss of projected features as a fine geometric constraint. The experimental results with DTU and BlendedMVS datasets in two prevalent sparse settings demonstrate significant improvements over the state-of-the-art methods.

AAAI Conference 2024 Conference Paper

Privileged Prior Information Distillation for Image Matting

  • Cheng Lyu
  • Jiake Xie
  • Bo Xu
  • Cheng Lu
  • Han Huang
  • Xin Huang
  • Ming Wu
  • Chuang Zhang

Performance of trimap-free image matting methods is limited when trying to decouple the deterministic and undetermined regions, especially in the scenes where foregrounds are semantically ambiguous, chromaless, or high transmittance. In this paper, we propose a novel framework named Privileged Prior Information Distillation for Image Matting (PPID-IM) that can effectively transfer privileged prior environment-aware information to improve the performance of trimap-free students in solving hard foregrounds. The prior information of trimap regulates only the teacher model during the training stage, while not being fed into the student network during actual inference. To achieve effective privileged cross-modality (i.e. trimap and RGB) information distillation, we introduce a Cross-Level Semantic Distillation (CLSD) module that reinforces the students with more knowledgeable semantic representations and environment-aware information. We also propose an Attention-Guided Local Distillation module that efficiently transfers privileged local attributes from the trimap-based teacher to trimap-free students for the guidance of local-region optimization. Extensive experiments demonstrate the effectiveness and superiority of our PPID on image matting. The code will be released soon.

NeurIPS Conference 2024 Conference Paper

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

  • Han Huang
  • Haitian Zhong
  • Tao Yu
  • Qiang Liu
  • Shu Wu
  • Liang Wang
  • Tieniu Tan

Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls short in the quality of synthesized evaluation images and cannot assess whether models apply edited knowledge in relevant content. Therefore, we employ more reliable data collection methods to construct a new Large $\textbf{V}$ision-$\textbf{L}$anguage Model $\textbf{K}$nowledge $\textbf{E}$diting $\textbf{B}$enchmark, $\textbf{VLKEB}$, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data. We conduct experiments of different editing methods on five LVLMs, and thoroughly analyze how do they impact the models. The results reveal strengths and deficiencies of these methods and hopefully provide insights for future research. The codes and dataset are available at: https: //github. com/VLKEB/VLKEB.

AAAI Conference 2023 Conference Paper

Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation

  • Han Huang
  • Leilei Sun
  • Bowen Du
  • Weifeng Lv

Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. We also combine the solvers with gradient guidance from the molecule property predictor for similarity-constrained molecule optimization. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.

IJCAI Conference 2020 Conference Paper

Beyond Network Pruning: a Joint Search-and-Training Approach

  • Xiaotong Lu
  • Han Huang
  • Weisheng Dong
  • Xin Li
  • Guangming Shi

Network pruning has been proposed as a remedy for alleviating the over-parameterization problem of deep neural networks. However, its value has been recently challenged especially from the perspective of neural architecture search (NAS). We challenge the conventional wisdom of pruning-after-training by proposing a joint search-and-training approach that directly learns a compact network from the scratch. By treating pruning as a search strategy, we present two new insights in this paper: 1) it is possible to expand the search space of networking pruning by associating each filter with a learnable weight; 2) joint search-and-training can be conducted iteratively to maximize the learning efficiency. More specifically, we propose a coarse-to-fine tuning strategy to iteratively sample and update compact sub-network to approximate the target network. The weights associated with network filters will be accordingly updated by joint search-and-training to reflect learned knowledge in NAS space. Moreover, we introduce strategies of random perturbation (inspired by Monte Carlo) and flexible thresholding (inspired by Reinforcement Learning) to adjust the weight and size of each layer. Extensive experiments on ResNet and VGGNet demonstrate the superior performance of our proposed method on popular datasets including CIFAR10, CIFAR100 and ImageNet.

ICRA Conference 2000 Conference Paper

Optimal Profile Generation in Distorted Surface Finishing

  • Zhiming Gong
  • Xiaoqi Chen
  • Han Huang

We present a template-based optimal profile fitting method to generate optimal profiles for robotic surface finishing of repairing parts. By using the design profile as a template, only a few measurement points are required for the fitting. A series of steps of the template-based optimal profile fitting are performed to generate the final fitted profile. A fast converging direct search minimization algorithm has been developed for the optimal fitting. The fitted profile is used to compute robotic paths for surface finishing. The template-based optimal profile fitting method has been successfully implemented as a core technique in a robotic surface finishing system for turbine vane airfoil repairing. It enables the system, which has been operating in a production line, to effectively adapt to the part variations.