Arrow Research search

Author name cluster

Tao Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
1 author row

Possible papers

14

AAAI Conference 2026 Conference Paper

BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning

  • Lan Li
  • Tao Hu
  • Da-Wei Zhou
  • Jia-Qi Yang
  • Han-Jia Ye
  • De-Chuan Zhan

Class-Incremental Learning (CIL) aims to continually learn new classes without forgetting previously acquired knowledge. Vision-language models such as CLIP offer strong transferable representations via multi-modal supervision, making them a promising choice for CIL. However, applying CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have not fully exploited the synergy between visual and textual modalities. To address these issues, we propose BOFA (Bridge-layer Orthogonal Fusion for Adaptation), a novel framework for CIL. BOFA restricts adaptation to CLIP’s existing cross-modal bridge layer, keeping the core learning process parameter-free and avoiding any extra adaptation modules. To prevent forgetting within this layer, it leverages Orthogonal Low-Rank Fusion, a mechanism that constrains parameter updates to a low-rank ``safe subspace" that is mathematically constructed to be approximately orthogonal to the feature subspace of past tasks. This encourages stable knowledge accumulation and mitigates interference between new and previously learned classes. Furthermore, BOFA employs a cross-modal hybrid prototype that fuses stable textual prototypes with dynamic visual counterparts derived from our adapted bridge layer, resulting in a more robust and discriminative classifier. Extensive experiments on standard benchmarks demonstrate that BOFA achieves superior accuracy and efficiency compared to existing methods.

IJCAI Conference 2025 Conference Paper

Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition

  • Xiaogang Xu
  • Kun Zhou
  • Tao Hu
  • Jiafei Wu
  • Ruixing Wang
  • Hao Peng
  • Bei Yu

Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise. In this paper, we present an innovative video decomposition strategy that incorporates view-independent and view-dependent components to enhance the performance of LLVE. We leverage dynamic cross-frame correspondences for the view-independent term (which primarily captures intrinsic appearance) and impose a scene-level continuity constraint on the view-dependent term (which mainly describes the shading condition) to achieve consistent and satisfactory decomposition results. To further ensure consistent decomposition, we introduce a dual-structure enhancement network featuring a cross-frame interaction mechanism. By supervising different frames simultaneously, this network encourages them to exhibit matching decomposition features. This mechanism can seamlessly integrate with encoder-decoder single-frame networks, incurring minimal additional parameter costs. Extensive experiments are conducted on widely recognized LLVE benchmarks, covering diverse scenarios. Our framework consistently outperforms existing methods, establishing a new SOTA performance.

AAAI Conference 2024 Conference Paper

Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain

  • Xuanhua He
  • Tao Hu
  • Guoli Wang
  • Zejin Wang
  • Run Wang
  • Qian Zhang
  • Keyu Yan
  • Ziyi Chen

RAW to sRGB mapping, which aims to convert RAW images from smartphones into RGB form equivalent to that of Digital Single-Lens Reflex (DSLR) cameras, has become an important area of research. However, current methods often ignore the difference between cell phone RAW images and DSLR camera RGB images, a difference that goes beyond the color matrix and extends to spatial structure due to resolution variations. Recent methods directly rebuild color mapping and spatial structure via shared deep representation, limiting optimal performance. Inspired by Image Signal Processing (ISP) pipeline, which distinguishes image restoration and enhancement, we present a novel Neural ISP framework, named FourierISP. This approach breaks the image down into style and structure within the frequency domain, allowing for independent optimization. FourierISP is comprised of three subnetworks: Phase Enhance Subnet for structural refinement, Amplitude Refine Subnet for color learning, and Color Adaptation Subnet for blending them in a smooth manner. This approach sharpens both color and structure, and extensive evaluations across varied datasets confirm that our approach realizes state-of-the-art results. Code will be available at https://github.com/alexhe101/FourierISP.

NeurIPS Conference 2024 Conference Paper

X-Ray: A Sequential 3D Representation For Generation

  • Tao Hu
  • Wenhang Ge
  • Yuyang Zhao
  • Gim Hee Lee

We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected surfaces. This process efficiently condenses the whole 3D object into a multi-frame video format, motivating the utilize of a network architecture similar to those in video diffusion models. This design ensures an efficient 3D representation by focusing solely on surface information. Also, we propose a two-stage pipeline to generate 3D objects from X-Ray Diffusion Model and Upsampler. We demonstrate the practicality and adaptability of our X-Ray representation by synthesizing the complete visible and hidden surfaces of a 3D object from a single input image. Experimental results reveal the state-of-the-art superiority of our representation in enhancing the accuracy of 3D generation, paving the way for new 3D representation research and practical applications. Our project page is in \url{https: //tau-yihouxiang. github. io/projects/X-Ray/X-Ray. html}.

IJCAI Conference 2022 Conference Paper

Enhancing Unsupervised Domain Adaptation via Semantic Similarity Constraint for Medical Image Segmentation

  • Tao Hu
  • Shiliang Sun
  • Jing Zhao
  • Dongyu Shi

This work proposes a novel unsupervised cross-modality adaptive segmentation method for medical images to tackle the performance degradation caused by the severe domain shift when neural networks are being deployed to unseen modalities. The proposed method is an end-2-end framework, which conducts appearance transformation via a domain-shared shallow content encoder and two domain-specific decoders. The feature extracted from the encoder is enhanced to be more domain-invariant by a similarity learning task using the proposed Semantic Similarity Mining (SSM) module which has a strong help of domain adaptation. The domain-invariant latent feature is then fused into the target domain segmentation sub-network, trained using the original target domain images and the translated target images from the source domain in the framework of adversarial training. The adversarial training is effective to narrow the remaining gap between domains in semantic space after appearance alignment. Experimental results on two challenging datasets demonstrate that our method outperforms the state-of-the-art approaches.

AAAI Conference 2020 Conference Paper

3D Shape Completion with Multi-View Consistent Inference

  • Tao Hu
  • Zhizhong Han
  • Matthias Zwicker

3D shape completion is important to enable machines to perceive the complete geometry of objects from partial observations. To address this problem, view-based methods have been presented. These methods represent shapes as multiple depth images, which can be back-projected to yield corresponding 3D point clouds, and they perform shape completion by learning to complete each depth image using neural networks. While view-based methods lead to state-of-the-art results, they currently do not enforce geometric consistency among the completed views during the inference stage. To resolve this issue, we propose a multi-view consistent inference technique for 3D shape completion, which we express as an energy minimization problem including a data term and a regularization term. We formulate the regularization term as a consistency loss that encourages geometric consistency among multiple views, while the data term guarantees that the optimized views do not drift away too much from a learned shape descriptor. Experimental results demonstrate that our method completes shapes more accurately than previous techniques.

AAAI Conference 2020 Conference Paper

Hierarchical Modes Exploring in Generative Adversarial Networks

  • Mengxiao Hu
  • Jinlong Li
  • Maolin Hu
  • Tao Hu

In conditional Generative Adversarial Networks (cGANs), when two different initial noises are concatenated with the same conditional information, the distance between their outputs is relatively smaller, which makes minor modes likely to collapse into large modes. To prevent this happen, we proposed a hierarchical mode exploring method to alleviate mode collapse in cGANs by introducing a diversity measurement into the objective function as the regularization term. We also introduced the Expected Ratios of Expansion (ERE) into the regularization term, by minimizing the sum of differences between the real change of distance and ERE, we can control the diversity of generated images w. r. t specific-level features. We validated the proposed algorithm on four conditional image synthesis tasks including categorical generation, paired and un-paired image translation and text-to-image generation. Both qualitative and quantitative results show that the proposed method is effective in alleviating the mode collapse problem in cGANs, and can control the diversity of output images w. r. t specific-level features.

AAAI Conference 2019 Conference Paper

Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation

  • Tao Hu
  • Pengwan Yang
  • Chiliang Zhang
  • Gang Yu
  • Yadong Mu
  • Cees G. M. Snoek

Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremendous amounts of data. The scarcity of annotated data becomes even more challenging in semantic segmentation since pixellevel annotation in segmentation task is more labor-intensive to acquire. To tackle this issue, we propose an Attentionbased Multi-Context Guiding (A-MCG) network, which consists of three branches: the support branch, the query branch, the feature fusion branch. A key differentiator of A-MCG is the integration of multi-scale context features between support and query branches, enforcing a better guidance from the support set. In addition, we also adopt a spatial attention along the fusion branch to highlight context information from several scales, enhancing self-supervision in one-shot learning. To address the fusion problem in multi-shot learning, Conv-LSTM is adopted to collaboratively integrate the sequential support features to elevate the final accuracy. Our architecture obtains state-of-the-art on unseen classes in a variant of PASCAL VOC12 dataset and performs favorably against previous work with large gains of 1. 1%, 1. 4% measured in mIoU in the 1-shot and 5-shot setting.

AAAI Conference 2018 Conference Paper

Facial Landmarks Detection by Self-Iterative Regression Based Landmarks-Attention Network

  • Tao Hu
  • Honggang Qi
  • Jizheng Xu
  • Qingming Huang

Cascaded Regression (CR) based methods have been proposed to solve facial landmarks detection problem, which learn a series of descent directions by multiple cascaded regressors separately trained in coarse and fine stages. They outperform the traditional gradient descent based methods in both accuracy and running speed. However, cascaded regression is not robust enough because each regressor’s training data comes from the output of previous regressor. Moreover, training multiple regressors requires lots of computing resources, especially for deep learning based methods. In this paper, we develop a Self-Iterative Regression (SIR) framework to improve the model efficiency. Only one self-iterative regressor is trained to learn the descent directions for samples from coarse stages to fine stages, and parameters are iteratively updated by the same regressor. Specifically, we proposed Landmarks-Attention Network (LAN) as our regressor, which concurrently learns features around each landmark and obtains the holistic location increment. By doing so, not only the rest of regressors are removed to simplify the training process, but the number of model parameters is significantly decreased. The experiments demonstrate that with only 3. 72M model parameters, our proposed method achieves the stateof-the-art performance.

NeurIPS Conference 2012 Conference Paper

A mechanistic model of early sensory processing based on subtracting sparse representations

  • Shaul Druckmann
  • Tao Hu
  • Dmitri Chklovskii

Early stages of sensory systems face the challenge of compressing information from numerous receptors onto a much smaller number of projection neurons, a so called communication bottleneck. To make more efficient use of limited bandwidth, compression may be achieved using predictive coding, whereby predictable, or redundant, components of the stimulus are removed. In the case of the retina, Srinivasan et al. (1982) suggested that feedforward inhibitory connections subtracting a linear prediction generated from nearby receptors implement such compression, resulting in biphasic center-surround receptive fields. However, feedback inhibitory circuits are common in early sensory circuits and furthermore their dynamics may be nonlinear. Can such circuits implement predictive coding as well? Here, solving the transient dynamics of nonlinear reciprocal feedback circuits through analogy to a signal-processing algorithm called linearized Bregman iteration we show that nonlinear predictive coding can be implemented in an inhibitory feedback circuit. In response to a step stimulus, interneuron activity in time constructs progressively less sparse but more accurate representations of the stimulus, a temporally evolving prediction. This analysis provides a powerful theoretical framework to interpret and understand the dynamics of early sensory processing in a variety of physiological experiments and yields novel predictions regarding the relation between activity and stimulus statistics.

NeurIPS Conference 2009 Conference Paper

Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

  • Tao Hu
  • Anthony Leonardo
  • Dmitri Chklovskii

One of the central problems in neuroscience is reconstructing synaptic connectivity in neural circuits. Synapses onto a neuron can be probed by sequentially stimulating potentially pre-synaptic neurons while monitoring the membrane voltage of the post-synaptic neuron. Reconstructing a large neural circuit using such a “brute force” approach is rather time-consuming and inefficient because the connectivity in neural circuits is sparse. Instead, we propose to measure a post-synaptic neuron’s voltage while stimulating simultaneously multiple randomly chosen potentially pre-synaptic neurons. To extract the weights of individual synaptic connections we apply a decoding algorithm recently developed for compressive sensing. Compared to the brute force approach, our method promises significant time savings that grow with the size of the circuit. We use computer simulations to find optimal stimulation parameters and explore the feasibility of our reconstruction method under realistic experimental conditions including noise and non-linear synaptic integration. Multiple-neuron stimulation allows reconstructing synaptic connectivity just from the spiking activity of post-synaptic neurons, even when sub-threshold voltage is unavailable. By using calcium indicators, voltage-sensitive dyes, or multi-electrode arrays one could monitor activity of multiple post-synaptic neurons simultaneously, thus mapping their synaptic inputs in parallel, potentially reconstructing a complete neural circuit.