Author name cluster

Lin Gu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

1 author row

AAAI Conference 2025 Conference Paper

EventPillars: Pillar-based Efficient Representations for Event Data

Rui Fan
Weidong Hao
Juntao Guan
Lai Rui
Lin Gu
Tong Wu
Fanhong Zeng
Zhangming Zhu

Event Cameras offer appealing advantages, including power efficiency and ultra-low latency, driving forward advancements in edge applications. In order to leverage mature frame-based algorithms, most approaches typically compute dense, image-like representations from sparse, asynchronous events. However, they are often unable to capture comprehensive information or are computationally intensive, which hinders the edge deployment of event-based vision. Meanwhile, pillar-based paradigms have been proven to be efficient and well established for dense representations of sparse data. Hence, from a novel pillar-based perspective, we present EventPillars, an efficient, comprehensive framework for dense event representations. To summarize, it (i) incorporates the Temporal Event Range to describe an intact temporal distribution, (ii) Activates the Event Polarities to explicitly record the scene dynamics, (iii) enhances the target awareness by a spatial attention prior from Normalized Event Density, (iv) can be plug-and-played into different downstream tasks. Extensive experiments show that our EventPillars records a new state-of-the-art precision on object recognition and detection datasets with surprisingly 9.2× and 4.5× lower computation and storage consumption. This brings a new insight into dense event representations and is promising to boost the edge deployment of event-based vision.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction

Jiahe Li
Jiawei Zhang
Youmin Zhang
Xiao Bai
Jin Zheng
Xiaohan Yu
Lin Gu

Reconstructing accurate surfaces with radiance fields has achieved remarkable progress in recent years. However, prevailing approaches, primarily based on Gaussian Splatting, are increasingly constrained by representational bottlenecks. In this paper, we introduce GeoSVR, an explicit voxel-based framework that explores and extends the under-investigated potential of sparse voxels for achieving accurate, detailed, and complete surface reconstruction. As strengths, sparse voxels support preserving the coverage completeness and geometric clarity, while corresponding challenges also arise from absent scene constraints and locality in surface refinement. To ensure correct scene convergence, we first propose a Voxel-Uncertainty Depth Constraint that maximizes the effect of monocular depth cues while presenting a voxel-oriented uncertainty to avoid quality degradation, enabling effective and robust scene constraints yet preserving highly accurate geometries. Subsequently, Sparse Voxel Surface Regularization is designed to enhance geometric consistency for tiny voxels and facilitate the voxel-based formation of sharp and accurate surfaces. Extensive experiments demonstrate our superior performance compared to existing methods across diverse challenging scenarios, excelling in geometric accuracy, detail preservation, and reconstruction completeness while maintaining high efficiency. Code is available at https: //github. com/Fictionarry/GeoSVR.

PDF Details

NeurIPS Conference 2025 Conference Paper

I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions

Shuhong Liu
Lin Gu
Ziteng Cui
Xuangeng Chu
Tatsuya Harada

Participating in efforts to endow generative AI with the 3D physical world perception, we propose I2-NeRF, a novel neural radiance field framework that enhances isometric and isotropic metric perception under media degradation. While existing NeRF models predominantly rely on object-centric sampling, I2-NeRF introduces a reverse-stratified upsampling strategy to achieve near-uniform sampling across 3D space, thereby preserving isometry. We further present a general radiative formulation for media degradation that unifies emission, absorption, and scattering into a particle model governed by the Beer–Lambert attenuation law. By matting direct and media-induced in-scatter radiance, this formulation extends naturally to complex media environments such as underwater, haze, and even low-light scenes. By treating light propagation uniformly in both vertical and horizontal directions, I2-NeRF enables isotropic metric perception and can even estimate medium properties such as water depth. Experiments on real-world datasets demonstrate that our method significantly improves both reconstruction fidelity and physical plausibility compared to existing approaches. The source code is available at https: //github. com/ShuhongLL/I2-NeRF.

PDF Details

AAAI Conference 2025 Conference Paper

Semantic-guided Masked Mutual Learning for Multi-modal Brain Tumor Segmentation with Arbitrary Missing Modalities

Guoyan Liang
Qin Zhou
Zhe Wang
Jingyuan Chen
Lin Gu
Chang Yao
Sai Wu
Bingcang Huang

Malignant brain tumors have become an aggressive and dangerous disease that leads to death worldwide. Multi-modal MRI data is crucial for accurate brain tumor segmentation, but missing modalities common in clinical practice can severely degrade the segmentation performance. While incomplete multi-modal learning methods attempt to address this, learning robust and discriminative features from arbitrary missing modalities remains challenging. To address this challenge, we propose a novel Semantic-guided Masked Mutual Learning (SMML) approach to distill robust and discriminative knowledge across diverse missing modality scenarios. Specifically, we propose a novel dual-branch masked mutual learning scheme guided by Hierarchical Consistency Constraints (HCC) to ensure multi-level consistency, thereby enhancing mutual learning in incomplete multi-modal scenarios. The HCC framework comprises a pixel-level constraint that selects and exchanges reliable knowledge to guide the mutual learning process. Additionally, it includes a feature-level constraint that uncovers robust inter-sample and inter-class relational knowledge within the latent feature space. To further enhance multi-modal learning from missing modality data, we integrate a refinement network into each student branch. This network leverages semantic priors from the Segment Anything Model (SAM) to provide supplementary information, effectively complementing the masked mutual learning strategy in capturing auxiliary discriminative knowledge. Extensive experiments on three challenging brain tumor segmentation datasets demonstrate that our method significantly improves performance over state-of-the-art methods in diverse missing modality settings.

PDF Details DOI

AAAI Conference 2025 Conference Paper

TdAttenMix: Top-Down Attention Guided Mixup

Zhiming Wang
Lin Gu
Feng Lu

CutMix is a data augmentation strategy that cuts and pastes image patches to mixup training data. Existing methods pick either random or salient areas which are often inconsistent to labels, thus misguiding the training model. By our knowledge, we integrate human gaze to guide cutmix for the first time. Since human attention is driven by both high-level recognition and low-level clues, we propose a controllable Top-down Attention Guided Module to obtain a general artificial attention which balances top-down and bottom-up attention. The proposed TdATttenMix then picks the patches and adjust the label mixing ratio that focuses on regions relevant to the current label. Experimental results demonstrate that our TdAttenMix outperforms existing state-of-the-art mixup methods across eight different benchmarks. Additionally, we introduce a new metric based on the human gaze and use this metric to investigate the issue of image-label inconsistency.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption

Ziteng Cui
Lin Gu
Xiao Sun
Xianzheng Ma
Yu Qiao
Tatsuya Harada

The standard Neural Radiance Fields (NeRF) paradigm employs a viewer-centered methodology, entangling the aspects of illumination and material reflectance into emission solely from 3D points. This simplified rendering approach presents challenges in accurately modeling images captured under adverse lighting conditions, such as low light or over-exposure. Motivated by the ancient Greek emission theory that posits visual perception as a result of rays emanating from the eyes, we slightly refine the conventional NeRF framework to train NeRF under challenging light conditions and generate normal-light condition novel views unsupervisedly. We introduce the concept of a ``Concealing Field," which assigns transmittance values to the surrounding air to account for illumination effects. In dark scenarios, we assume that object emissions maintain a standard lighting level but are attenuated as they traverse the air during the rendering process. Concealing Field thus compel NeRF to learn reasonable density and colour estimations for objects even in dimly lit situations. Similarly, the Concealing Field can mitigate over-exposed emissions during rendering stage. Furthermore, we present a comprehensive multi-view dataset captured under challenging illumination conditions for evaluation. Our code and proposed dataset are available at https://github.com/cuiziteng/Aleth-NeRF.

PDF Details DOI

AIIM Journal 2024 Journal Article

Can physician judgment enhance model trustworthiness? A case study on predicting pathological lymph nodes in rectal cancer

Kazuma Kobayashi
Yasuyuki Takamizawa
Mototaka Miyake
Sono Ito
Lin Gu
Tatsuya Nakatsuka
Yu Akagi
Tatsuya Harada

Details DOI

NeurIPS Conference 2024 Conference Paper

TinyLUT: Tiny Look-Up Table for Efficient Image Restoration at the Edge

Huanan Li
Juntao Guan
Rui Lai
Sijun Ma
Lin Gu
Zhangming Zhu

Look-up tables(LUTs)-based methods have recently shown enormous potential in image restoration tasks, which are capable of significantly accelerating the inference. However, the size of LUT exhibits exponential growth with the convolution kernel size, creating a storage bottleneck for its broader application on edge devices. Here, we address the storage explosion challenge to promote the capacity of mapping the complex CNN models by LUT. We introduce an innovative separable mapping strategy to achieve over $7\times$ storage reduction, transforming the storage from exponential dependence on kernel size to a linear relationship. Moreover, we design a dynamic discretization mechanism to decompose the activation and compress the quantization scale that further shrinks the LUT storage by $4. 48\times$. As a result, the storage requirement of our proposed TinyLUT is around 4. 1\% of MuLUT-SDY-X2 and amenable to on-chip cache, yielding competitive accuracy with over $5\times$ lower inference latency on Raspberry 4B than FSRCNN. Our proposed TinyLUT enables superior inference speed on edge devices with new state-of-the-art accuracy on both of image super-resolution and denoising, showcasing the potential of applying this method to various image restoration tasks at the edge. The codes are available at: https: //github. com/Jonas-KD/TinyLUT.

PDF Details DOI

AAAI Conference 2023 Conference Paper

People Taking Photos That Faces Never Share: Privacy Protection and Fairness Enhancement from Camera to User

Junjie Zhu
Lin Gu
Xiaoxiao Wu
Zheng Li
Tatsuya Harada
Yingying Zhu

The soaring number of personal mobile devices and public cameras poses a threat to fundamental human rights and ethical principles. For example, the stolen of private information such as face image by malicious third parties will lead to catastrophic consequences. By manipulating appearance of face in the image, most of existing protection algorithms are effective but irreversible. Here, we propose a practical and systematic solution to invertiblely protect face information in the full-process pipeline from camera to final users. Specifically, We design a novel lightweight Flow-based Face Encryption Method (FFEM) on the local embedded system privately connected to the camera, minimizing the risk of eavesdropping during data transmission. FFEM uses a flow-based face encoder to encode each face to a Gaussian distribution and encrypts the encoded face feature by random rotating the Gaussian distribution with the rotation matrix is as the password. While encrypted latent-variable face images are sent to users through public but less reliable channels, password will be protected through more secure channels through technologies such as asymmetric encryption, blockchain, or other sophisticated security schemes. User could select to decode an image with fake faces from the encrypted image on the public channel. Only trusted users are able to recover the original face using the encrypted matrix transmitted in secure channel. More interestingly, by tuning Gaussian ball in latent space, we could control the fairness of the replaced face on attributes such as gender and race. Extensive experiments demonstrate that our solution could protect privacy and enhance fairness with minimal effect on high-level downstream task.

PDF Details DOI

AAAI Conference 2022 Conference Paper

EtinyNet: Extremely Tiny Network for TinyML

Kunran Xu
Yishi Li
Huawei Zhang
Rui Lai
Lin Gu

There are many AI applications in high-income countries because their implementation depends on expensive GPU cards (∼2000$) and reliable power supply (∼200W). To deploy AI in resource-poor settings on cheaper (∼20$) and low-power devices (<1W), key modifications are required to adapt neural networks for Tiny machine learning (TinyML). In this paper, for putting CNNs into storage limited devices, we developed efficient tiny models with only hundreds of KB parameters. Toward this end, we firstly design a parameter-efficient tiny architecture by introducing dense linear depthwise block. Then, a novel adaptive scale quantization (ASQ) method is proposed for further quantizing tiny models in aggressive low-bit while retaining the accuracy. With the optimized architecture and 4-bit ASQ, we present a family of ultralightweight networks, named EtinyNet, that achieves 57. 0% ImageNet top-1 accuracy with an extremely tiny model size of 340KB. When deployed on an off-the-shelf commercial microcontroller for object detection tasks, EtinyNet achieves state-of-the-art 56. 4% mAP on Pascal VOC. Furthermore, the experimental results on Xilinx compact FPGA indicate that EtinyNet achieves prominent low power of 620mW, about 5. 6 × lower than existing FPGA designs. The code and demo are in https: //github. com/aztc/EtinyNet

PDF Details

JBHI Journal 2022 Journal Article

Explainable Diabetic Retinopathy Detection and Retinal Image Generation

Yuhao Niu
Lin Gu
Yitian Zhao
Feng Lu

Though deep learning has shown successful performance in classifying the label and severity stage of certain diseases, most of them give few explanations on how to make predictions. Inspired by Koch's Postulates, the foundation in evidence-based medicine (EBM) to identify the pathogen, we propose to exploit the interpretability of deep learning application in medical diagnosis. By isolating neuron activation patterns from a diabetic retinopathy (DR) detector and visualizing them, we can determine the symptoms that the DR detector identifies as evidence to make prediction. To be specific, we first define novel pathological descriptors using activated neurons of the DR detector to encode both spatial and appearance information of lesions. Then, to visualize the symptom encoded in the descriptor, we propose Patho-GAN, a new network to synthesize medically plausible retinal images. By manipulating these descriptors, we could even arbitrarily control the position, quantity, and categories of generated lesions. We also show that our synthesized images carry the symptoms directly related to diabetic retinopathy diagnosis. Our generated images are both qualitatively and quantitatively superior to the ones by previous methods. Besides, compared to existing methods that take hours to generate an image, our second level speed endows the potential to be an effective solution for data augmentation.

Details DOI

AAAI Conference 2022 Conference Paper

Towards an Effective Orthogonal Dictionary Convolution Strategy

Yishi Li
Kunran Xu
Rui Lai
Lin Gu

Orthogonality regularization has proven effective in improving the precision, convergence speed and the training stability of CNNs. Here, we propose a novel Orthogonal Dictionary Convolution Strategy (ODCS) on CNNs to improve orthogonality effect by optimizing the network architecture and changing the regularized object. Specifically, we remove the nonlinear layer in typical convolution block “Conv(BN) + Nonlinear + Pointwise Conv(BN)”, and only impose orthogonal regularization on the front Conv. The structure, “Conv(BN) + Pointwise Conv(BN)”, is then equivalent to a pair of dictionary and encoding, defined in sparse dictionary learning. Thanks to the exact and efficient representation of signal with dictionaries in low-dimensional projections, our strategy could reduce the superfluous information in dictionary Conv kernels. Meanwhile, the proposed strategy relieves the too strict orthogonality regularization in training, which makes hyper-parameters tuning of model to be more flexible. In addition, our ODCS can modify the state-of-the-art models easily without any extra consumption in inference phase. We evaluate it on a variety of CNNs in small-scale (CI- FAR), large-scale (ImageNet) and fine-grained (CUB-200- 2011) image classification tasks, respectively. The experimental results show that our method achieve a stable and superior improvement.

PDF Details

AAAI Conference 2019 Conference Paper

Pathological Evidence Exploration in Deep Retinal Image Diagnosis

Yuhao Niu
Lin Gu
Feng Lu
Feifan Lv
Zongji Wang
Imari Sato
Zijian Zhang
Yangyan Xiao

Though deep learning has shown successful performance in classifying the label and severity stage of certain disease, most of them give few evidence on how to make prediction. Here, we propose to exploit the interpretability of deep learning application in medical diagnosis. Inspired by Koch’s Postulates, a well-known strategy in medical research to identify the property of pathogen, we define a pathological descriptor that can be extracted from the activated neurons of a diabetic retinopathy detector. To visualize the symptom and feature encoded in this descriptor, we propose a GAN based method to synthesize pathological retinal image given the descriptor and a binary vessel segmentation. Besides, with this descriptor, we can arbitrarily manipulate the position and quantity of lesions. As verified by a panel of 5 licensed ophthalmologists, our synthesized images carry the symptoms that are directly related to diabetic retinopathy diagnosis. The panel survey also shows that our generated images is both qualitatively and quantitatively superior to existing methods.

PDF Details