Author name cluster

Simon Lucey

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

AAAI Conference 2026 Conference Paper

SineLoRA∆: Sine-Activated Delta Compression

Cameron Gordon
Yiping Ji
Hemanth Saratchandran
Paul Albert
Simon Lucey

Resource-constrained weight deployment is a task of immense practical importance. Recently, there has been interest in the specific task of Delta Compression, where parties each hold a common base model and only communicate compressed weight updates. However, popular parameter efficient updates such as Low Rank Adaptation (LoRA) face inherent representation limitations - which are especially pronounced when combined with aggressive quantization. To overcome this, we build on recent work that improves LoRA representation capacity by using fixed-frequency sinusoidal functions to increase stable rank without adding additional parameters. We extend this to the quantized setting and present the first theoretical analysis showing how stable rank evolves under quantization. From this, we introduce SineLoRA∆, a principled and effective method for delta compression that improves the expressivity of quantized low-rank adapters by applying a sinusoidal activation. We validate SineLoRA∆ across a diverse variety of domains - including language modeling, vision-language tasks, and text-to-image generation - achieving up to 66% memory reduction with similar performance. We additionally provide a novel application of the canonical Bjøntegaard Delta metric to consistently compare adapter compression changes across the rate-distortion curve.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Efficient Learning with Sine-Activated Low-Rank Matrices

Yiping Ji
Hemanth Saratchandran
Cameron Gordon
Zeyu Zhang 0006
Simon Lucey

Low-rank decomposition has emerged as a vital tool for enhancing parameter efficiency in neural network architectures, gaining traction across diverse applications in machine learning. These techniques significantly lower the number of parameters, striking a balance between compactness and performance. However, a common challenge has been the compromise between parameter efficiency and the accuracy of the model, where reduced parameters often lead to diminished accuracy compared to their full-rank counterparts. In this work, we propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process. This approach not only preserves the benefits of the parameter efficiency characteristic of low-rank methods but also increases the decomposition's rank, thereby enhancing model performance. Our method proves to be a plug in enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF) and 3D shape modelling.

Details

NeurIPS Conference 2025 Conference Paper

Spectral Conditioning of Attention Improves Transformer Performance

Hemanth Saratchandran
Simon Lucey

We present a theoretical analysis of the Jacobian of a attention block within a transformer, showing that it is governed by the query, key, and value projections that define the attention mechanism. Leveraging this insight, we introduce a method that systematically alters the spectral properties of each attention layer to reduce the Jacobian’s condition number, thereby improving the overall conditioning of the attention layers within a transformer network. We empirically show that this improved Jacobian conditioning translates to enhanced performance in practice. Our approach is simple, broadly applicable, and can be easily integrated as a drop-in replacement for a wide range of existing attention mechanisms. We validate its effectiveness across diverse transformer architectures and tasks, demonstrating consistent improvements in performance.

PDF Details

NeurIPS Conference 2025 Conference Paper

Structured Initialization for Vision Transformers

Jianqiao Zheng
Xueqian Li
Hemanth Saratchandran
Simon Lucey

Convolutional Neural Networks (CNNs) inherently encode strong inductive biases, enabling effective generalization on small-scale datasets. In this paper, we propose integrating this inductive bias into ViTs, not through an architectural intervention but solely through initialization. The motivation here is to have a ViT that can enjoy strong CNN-like performance when data assets are small, but can still scale to ViT-like performance as the data expands. Our approach is motivated by our empirical results that random impulse filters can achieve commensurate performance to learned filters within a CNN. We improve upon current ViT initialization strategies, which typically rely on empirical heuristics such as using attention weights from pretrained models or focusing on the distribution of attention weights without enforcing structures. Empirical results demonstrate that our method significantly outperforms standard ViT initialization across numerous small and medium-scale benchmarks, including Food-101, CIFAR-10, CIFAR-100, STL-10, Flowers, and Pets, while maintaining comparative performance on large-scale datasets such as ImageNet-1K. Moreover, our initialization strategy can be easily integrated into various transformer-based architectures such as Swin Transformer and MLP-Mixer with consistent improvements in performance.

PDF Details

ICML Conference 2024 Conference Paper

A sampling theory perspective on activations for implicit neural representations

Hemanth Saratchandran
Sameera Ramasinghe
Violetta Shevchenko
Alexander Long
Simon Lucey

Implicit Neural Representations (INRs) have gained popularity for encoding signals as compact, differentiable entities. While commonly using techniques like Fourier positional encodings or non-traditional activation functions (e. g. , Gaussian, sinusoid, or wavelets) to capture high-frequency content, their properties lack exploration within a unified theoretical framework. Addressing this gap, we conduct a comprehensive analysis of these activations from a sampling theory perspective. Our investigation reveals that, especially in shallow INRs, $\mathrm{sinc}$ activations—previously unused in conjunction with INRs—are theoretically optimal for signal encoding. Additionally, we establish a connection between dynamical systems and INRs, leveraging sampling theory to bridge these two paradigms.

Details

AAAI Conference 2023 Conference Paper

A Learnable Radial Basis Positional Embedding for Coordinate-MLPs

Sameera Ramasinghe
Simon Lucey

We propose a novel method to enhance the performance of coordinate-MLPs (also referred to as neural fields) by learning instance-specific positional embeddings. End-to-end optimization of positional embedding parameters along with network weights leads to poor generalization performance. Instead, we develop a generic framework to learn the positional embedding based on the classic graph-Laplacian regularization, which can implicitly balance the trade-off between memorization and generalization. This framework is then used to propose a novel positional embedding scheme, where the hyperparameters are learned per coordinate (i.e instance) to deliver optimal performance. We show that the proposed embedding achieves better performance with higher stability compared to the well-established random Fourier features (RFF). Further, we demonstrate that the proposed embedding scheme yields stable gradients, enabling seamless integration into deep architectures as intermediate layers.

PDF Details DOI

ICML Conference 2023 Conference Paper

How much does Initialization Affect Generalization?

Sameera Ramasinghe
Lachlan Ewen MacDonald
Moshiur R. Farazi
Hemanth Saratchandran
Simon Lucey

Characterizing the remarkable generalization properties of over-parameterized neural networks remains an open problem. A growing body of recent literature shows that the bias of stochastic gradient descent (SGD) and architecture choice implicitly leads to better generalization. In this paper, we show on the contrary that, independently of architecture, SGD can itself be the cause of poor generalization if one does not ensure good initialization. Specifically, we prove that any differentiably parameterized model, trained under gradient flow, obeys a weak spectral bias law which states that sufficiently high frequencies train arbitrarily slowly. This implies that very high frequencies present at initialization will remain after training, and hamper generalization. Further, we empirically test the developed theoretical insights using practical, deep networks. Finally, we contrast our framework with that supplied by the flat-minima conjecture and show that Fourier analysis grants a more reliable framework for understanding the generalization of neural networks.

Details

NeurIPS Conference 2023 Conference Paper

On skip connections and normalisation layers in deep optimisation

Lachlan MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey

We introduce a general theoretical framework, designed for the study of gradient optimisation of deep neural networks, that encompasses ubiquitous architecture choices including batch normalisation, weight normalisation and skip connections. Our framework determines the curvature and regularity properties of multilayer loss landscapes in terms of their constituent layers, thereby elucidating the roles played by normalisation layers and skip connections in globalising these properties. We then demonstrate the utility of this framework in two respects. First, we give the only proof of which we are aware that a class of deep neural networks can be trained using gradient descent to global optima even when such optima only exist at infinity, as is the case for the cross-entropy cost. Second, we identify a novel causal mechanism by which skip connections accelerate training, which we verify predictively with ResNets on MNIST, CIFAR10, CIFAR100 and ImageNet.

PDF Details

NeurIPS Conference 2022 Conference Paper

MBW: Multi-view Bootstrapping in the Wild

Mosam Dabhi
Chaoyang Wang
Tim Clifford
László Jeni
Ian Fasel
Simon Lucey

Labeling articulated objects in unconstrained settings has a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e. g. , humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled. The approach, however, is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates from videos with only two or three uncalibrated, handheld cameras. With just a few annotations (representing $1-2\%$ of the frames), we are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches. Our Multi-view Bootstrapping in the Wild (MBW) approach demonstrates impressive results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo. We release the codebase for MBW as well as this challenging zoo dataset consisting of image frames of tail-end distribution categories with their corresponding 2D and 3D labels generated from minimal human intervention.

PDF Details

NeurIPS Conference 2022 Conference Paper

On the Frequency-bias of Coordinate-MLPs

Sameera Ramasinghe
Lachlan E. MacDonald
Simon Lucey

We show that typical implicit regularization assumptions for deep neural networks (for regression) do not hold for coordinate-MLPs, a family of MLPs that are now ubiquitous in computer vision for representing high-frequency signals. Lack of such implicit bias disrupts smooth interpolations between training samples, and hampers generalizing across signal regions with different spectra. We investigate this behavior through a Fourier lens and uncover that as the bandwidth of a coordinate-MLP is enhanced, lower frequencies tend to get suppressed unless a suitable prior is provided explicitly. Based on these insights, we propose a simple regularization technique that can mitigate the above problem, which can be incorporated into existing networks without any architectural modifications.

PDF Details

ICRA Conference 2021 Conference Paper

HyperMap: Compressed 3D Map for Monocular Camera Registration

Ming-Fang Chang
Joshua G. Mangelson
Michael Kaess
Simon Lucey

We address the problem of image registration to a compressed 3D map. While this is most often performed by comparing LiDAR scans to the point cloud based map, it depends on an expensive LiDAR sensor at run time and the large point cloud based map creates overhead in data storage and transmission. Recently, efforts have been underway to replace the expensive LiDAR sensor with cheaper cameras and perform 2D-3D localization. In contrast to the previous work that learns relative pose by comparing projected depth and camera images, we propose HyperMap, a paradigm shift from online depth map feature extraction to offline 3D map feature computation for the 2D-3D camera registration task through end-to-end training. In the proposed pipeline, we first perform offline 3D sparse convolution to extract and compress the voxelwise hypercolumn features for the whole map. Then at run-time, we project and decode the compressed map features to the rough initial camera pose to form a virtual feature image. A Convolutional Neural Network (CNN) is then used to predict the relative pose between the camera image and the virtual feature image. In addition, we propose an efficient occlusion handling layer, specifically designed for large point clouds, to remove occluded points in projection. Our experiments on synthetic and real datasets show that, by moving the feature computation load offline and compressing, we reduced map size by 87−94% while maintaining comparable or better accuracy.

Details

IROS Conference 2021 Conference Paper

Map Compressibility Assessment for LiDAR Registration

Ming-Fang Chang
Wei Dong
Joshua G. Mangelson
Michael Kaess
Simon Lucey

We aim to assess the performance of LiDAR-to-map registration on compressive maps. Modern autonomous vehicles utilize pre-built HD (High-Definition) maps to perform sensor-to-map registration, which recovers pose estimation failures and reduces drift in a large-scale environment. However, sensor-to-map registration is usually realized by registering the sensor to a dense 3D model, which occupies massive storage space in the HD map and requires much data processing overhead. Although smaller 3D models are preferable, the optimal compressive map format for preservation of the best registration performance remains unclear. In this paper, we propose a novel and challenging benchmark to evaluate existing LiDAR-to-map registration methods from three perspectives: map compressibility, robustness, and precision. We compared various map formats, including raw points, hierarchical GMMs, and feature points, and show their performance trade-offs between compressibility and robustness on real-world LiDAR datasets: KITTI Odometry Dataset and Argoverse Tracking Dataset. Our benchmark reveals that state-of-the-art deep feature point based methods outperform traditional methods significantly when the map size budget is high. However, when map size budget is low, deep methods are outperformed by the methods using simpler models in Argoverse Tracking Dataset due to poor spatial coverage. In addition, we observe that the recently published TEASER++ significantly outperforms RANSAC for the feature point methods. Our analysis provides a valuable reference for the community to design budgeted real-world systems and find potential research opportunities. We will release the benchmark for public use.

Details

NeurIPS Conference 2021 Conference Paper

Neural Scene Flow Prior

Xueqian Li
Jhony Kaesemodel Pontes
Simon Lucey

Before the deep learning revolution, many perception algorithms were based on runtime optimization in conjunction with a strong prior/regularization penalty. A prime example of this in computer vision is optical and scene flow. Supervised learning has largely displaced the need for explicit regularization. Instead, they rely on large amounts of labeled data to capture prior statistics, which are not always readily available for many problems. Although optimization is employed to learn the neural network, at runtime, the weights of this network are frozen. As a result, these learning solutions are domain-specific and do not generalize well to other statistically different scenarios. This paper revisits the scene flow problem that relies predominantly on runtime optimization and strong regularization. A central innovation here is the inclusion of a neural scene flow prior, which utilizes the architecture of neural networks as a new type of implicit regularizer. Unlike learning-based scene flow methods, optimization occurs at runtime, and our approach needs no offline datasets---making it ideal for deployment in new environments such as autonomous driving. We show that an architecture based exclusively on multilayer perceptrons (MLPs) can be used as a scene flow prior. Our method attains competitive---if not better---results on scene flow benchmarks. Also, our neural prior's implicit and continuous scene flow representation allows us to estimate dense long-term correspondences across a sequence of point clouds. The dense motion information is represented by scene flow fields where points can be propagated through time by integrating motion vectors. We demonstrate such a capability by accumulating a sequence of lidar point clouds.

PDF Details

NeurIPS Conference 2020 Conference Paper

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Chen-Hsuan Lin
Chaoyang Wang
Simon Lucey

Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets. Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes, dramatically reducing the cost and effort of annotation. These techniques, however, remain impractical as they still require multi-view annotations of the same object instance during training. As a result, most experimental efforts to date have been limited to synthetic datasets. In this paper, we address this issue and propose SDF-SRN, an approach that requires only a single view of objects at training time, offering greater utility for real-world scenarios. SDF-SRN learns implicit 3D shape representations to handle arbitrary shape topologies that may exist in the datasets. To this end, we derive a novel differentiable rendering formulation for learning signed distance functions (SDF) from 2D silhouettes. Our method outperforms the state of the art under challenging single-view supervision settings on both synthetic and real-world datasets.

PDF Details

ICRA Conference 2019 Conference Paper

GPS-Denied UAV Localization using Pre-existing Satellite Imagery

Hunter Goforth
Simon Lucey

We present a method for localization of Unmanned Aerial Vehicles (UAVs) which is meant to replace an onboard GPS system in the event of a noisy or unreliable GPS signal. Our method requires only a downward-facing monocular RGB camera on the UAV, and pre-existing satellite imagery of the flight location to which the UAV imagery is compared and aligned. To overcome differences in the image capturing conditions between the satellite and UAV, such as seasonal and perspective changes, we propose the use of Convolutional Neural Network (CNN) representations trained on readily available satellite data. To increase localization accuracy, we also develop an optimization which jointly minimizes the error between adjacent UAV frames as well as the satellite map. We demonstrate how our method improves on recent systems from literature by achieving greater performance in flight environments with very few landmarks. For a GPS-denied flight at 0. 2km altitude, over a flight distance of 0. 85km, we achieve average localization error of less than 8 meters. We make our source code and datasets available to encourage further work on this emerging topic.

Details

ICRA Conference 2018 Conference Paper

Deep-LK for Efficient Adaptive Object Tracking

Chaoyang Wang 0001
Hamed Kiani Galoogahi
Chen-Hsuan Lin 0001
Simon Lucey

In this paper, we present a new approach for efficient regression-based object tracking. Our approach is closely related to the Generic Object Tracking Using Regression Networks (GOTURN) framework [1]. We make the following contributions. First, we demonstrate that there is a theoretical relationship between Siamese regression networks like GOTURN and the classical Inverse Compositional Lucas & Kanade (IC-LK) algorithm. Further, we demonstrate that unlike GOTURN, IC-LK adapts its regressor to the appearance of the current tracked frame. We argue that the lack of such property in GOTURN attributes to its poor performance on unseen objects and/or viewpoints. Second, we propose a novel framework for object tracking inspired by the IC-LK framework, which we refer to as Deep-LK. Finally, we show impressive results demonstrating that Deep-LK substantially outperforms GOTURN and demonstrate comparable tracking performance against current state-of-the-art deep trackers on high frame-rate sequences whilst being an order of magnitude (100 FPS) computationally efficient.

Details

AAAI Conference 2018 Conference Paper

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Chen-Hsuan Lin
Chen Kong
Simon Lucey

Conventional methods of 3D object generative modeling learn volumetric predictions using deep networks with 3D convolutional operations, which are direct analogies to classical 2D ones. However, these methods are computationally wasteful in attempt to predict 3D shapes, where information is rich only on the surfaces. In this paper, we propose a novel 3D generative modeling framework to efﬁciently generate object shapes in the form of dense point clouds. We use 2D convolutional operations to predict the 3D structure from multiple viewpoints and jointly apply geometric reasoning with 2D projection optimization. We introduce the pseudo-renderer, a differentiable module to approximate the true rendering operation, to synthesize novel depth maps for optimization. Experimental results for single-image 3D object reconstruction tasks show that we outperforms state-of-the-art methods in terms of shape similarity and prediction density.

PDF Details