Author name cluster

Violetta Shevchenko

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training

Sameera Ramasinghe
Thalaiyasingam Ajanthan
Hadi Mohaghegh Dolatabadi
Gil Avraham
Violetta Shevchenko
Yan Zuo
Chamin P Hewa Koneputugodage
Alexander Long

Pretraining language models with extended context windows enhances their ability to leverage rich information during generation. Existing methods split input sequences into chunks, broadcast them across multiple devices, and compute attention block by block which incurs significant communication overhead. While feasible in high-speed clusters, these methods are impractical for decentralized training over low-bandwidth connections. We propose a compression method for communication-efficient context parallelism in decentralized settings, achieving a remarkable compression rate of over 95% with negligible overhead and no loss in convergence. Our key insight is to exploit the intrinsic low-rank structure of activation outputs by dynamically constraining them to learned mixtures of subspaces via efficient reparameterizations. We demonstrate scaling billion-parameter decentralized models to context lengths exceeding 100K tokens on networks as slow as 300Mbps, matching the wall-clock convergence speed of centralized models on 100Gbps interconnects.

PDF Details

NeurIPS Conference 2025 Conference Paper

SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning

Chen Chen
Majid Abdolshah
Violetta Shevchenko
Hongdong Li
Chang Xu
Pulak Purkait

Existing diffusion-based super-resolution approaches often exhibit semantic ambiguities due to inaccuracies and incompleteness in their text conditioning, coupled with the inherent tendency for cross-attention to divert towards irrelevant pixels. These limitations can lead to semantic misalignment and hallucinated details in the generated high-resolution outputs. To address these, we propose a novel, plug-and-play spatially re-focused super-resolution (SRSR) framework that consists of two core components: first, we introduce Spatially Re-focused Cross-Attention (SRCA), which refines text conditioning at inference time by applying visually-grounded segmentation masks to guide cross-attention. Second, we introduce a Spatially Targeted Classifier-Free Guidance (STCFG) mechanism that selectively bypasses text influences on ungrounded pixels to prevent hallucinations. Extensive experiments on both synthetic and real-world datasets demonstrate that SRSR consistently outperforms seven state-of-the-art baselines in standard fidelity metrics (PSNR and SSIM) across all datasets, and in perceptual quality measures (LPIPS and DISTS) on two real-world benchmarks, underscoring its effectiveness in achieving both high semantic fidelity and perceptual quality in super-resolution.

PDF Details

NeurIPS Conference 2025 Conference Paper

Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization

Alexander Long
Chamin P Hewa Koneputugodage
Thalaiyasingam Ajanthan
Yan Zuo
Gil Avraham
Violetta Shevchenko
Hadi Mohaghegh Dolatabadi
Sameera Ramasinghe

We consider a decentralized setup in which the participants collaboratively train and serve a large neural network, and where each participant only processes a subset of the model. In this setup, we explore the possibility of unmaterializable weights, where a full weight set is never available to any one participant. We introduce Unextractable Protocol Models (UPMs): a training and inference framework that leverages the sharded model setup to ensure model shards (i. e. ,, subsets) held by participants are incompatible at different time steps. UPMs periodically inject time-varying, random, invertible transforms at participant boundaries; preserving the overall network function yet rendering cross-time assemblies incoherent. On Qwen-2. 5-0. 5B and Llama-3. 2-1B, 10 000 transforms leave FP32 perplexity unchanged ($\Delta$PPL$< 0. 01$; Jensen–Shannon drift $<4 \times 10^{-5}$), and we show how to control growth for lower precision datatypes. Applying a transform every 30s adds 3% latency, 0. 1% bandwidth, and 10% GPU-memory overhead at inference, while training overhead falls to 1. 6% time and < 1% memory. We consider several attacks, showing that the requirements of direct attacks are impractical and easy to defend against, and that gradient-based fine-tuning of stitched partitions consumes $\geq 60\%$ of the tokens required to train from scratch. By enabling models to be collaboratively trained yet not extracted, UPMs make it practical to embed programmatic incentive mechanisms in community-driven decentralized training.

PDF Details

ICML Conference 2024 Conference Paper

A sampling theory perspective on activations for implicit neural representations

Hemanth Saratchandran
Sameera Ramasinghe
Violetta Shevchenko
Alexander Long
Simon Lucey

Implicit Neural Representations (INRs) have gained popularity for encoding signals as compact, differentiable entities. While commonly using techniques like Fourier positional encodings or non-traditional activation functions (e. g. , Gaussian, sinusoid, or wavelets) to capture high-frequency content, their properties lack exploration within a unified theoretical framework. Addressing this gap, we conduct a comprehensive analysis of these activations from a sampling theory perspective. Our investigation reveals that, especially in shallow INRs, $\mathrm{sinc}$ activations—previously unused in conjunction with INRs—are theoretically optimal for signal encoding. Additionally, we establish a connection between dynamical systems and INRs, leveraging sampling theory to bridge these two paradigms.

Details

AAAI Conference 2024 Conference Paper

BLiRF: Bandlimited Radiance Fields for Dynamic Scene Modeling

Sameera Ramasinghe
Violetta Shevchenko
Gil Avraham
Anton van den Hengel

Inferring the 3D structure of a non-rigid dynamic scene from a single moving camera is an under-constrained problem. Inspired by the remarkable progress of neural radiance fields (NeRFs) in photo-realistic novel view synthesis of static scenes, it has also been extended to dynamic settings. Such methods heavily rely on implicit neural priors to regularize the problem. In this work, we take a step back and investigate how current implementations may entail deleterious effects including limited expressiveness, entanglement of light and density fields, and sub-optimal motion localization. Further, we devise a factorisation-based framework that represents the scene as a composition of bandlimited, high-dimensional signals. We demonstrate compelling results across complex dynamic scenes that involve changes in lighting, texture and long-range dynamics.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Improving the Convergence of Dynamic NeRFs via Optimal Transport

Sameera Ramasinghe
Violetta Shevchenko
Gil Avraham
Hisham Husain
Anton van den Hengel

Synthesizing novel views for dynamic scenes from a collection of RGB inputs poses significant challenges due to the inherent under-constrained nature of the problem. To mitigate this ill-posedness, practitioners in the field of neural radiance fields (NeRF) often resort to the adoption of intricate geometric regularization techniques, including scene flow, depth estimation, or learned perceptual similarity. While these geometric cues have demonstrated their effectiveness, their incorporation leads to evaluation of computationally expensive off-the-shelf models, introducing substantial computational overhead into the pipeline. Moreover, seamlessly integrating such modules into diverse dynamic NeRF models can be a non-trivial task, hindering their utilization in an architecture-agnostic manner. In this paper, we propose a theoretically grounded, lightweight regularizer by treating the dynamics of a time-varying scene as a low-frequency change of a probability distribution of the light intensity. We constrain the dynamics of this distribution using optimal transport (OT) and provide error bounds under reasonable assumptions. Our regularization is learning-free, architecture agnostic, and can be implemented with just a few lines of code. Finally, we demonstrate the practical efficacy of our regularizer across state-of-the-art architectures.

Details