Author name cluster

Ronald Clark

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Cloud4D: Estimating Cloud Properties at a High Spatial and Temporal Resolution

Jacob Lin
Edward Gryspeerdt
Ronald Clark

There has been great progress in improving numerical weather prediction and climate models using machine learning. However, most global models act at a kilometer-scale, making it challenging to model individual clouds and factors such as extreme precipitation, wind gusts, turbulence, and surface irradiance. Therefore, there is a need to move towards higher-resolution models, which in turn require high-resolution real-world observations that current instruments struggle to obtain. We present Cloud4D, the first learning-based framework that reconstructs a physically consistent, four–dimensional cloud state using only synchronized ground‐based cameras. Leveraging a homography-guided 2D‐to‐3D transformer, Cloud4D infers the full 3D distribution of liquid water content at 25 m spatial and 5 s temporal resolution. By tracking the 3D liquid water content retrievals over time, Cloud4D additionally estimates horizontal wind vectors. Across a two-month deployment comprising six skyward cameras, our system delivers an order-of-magnitude improvement in space-time resolution relative to state-of-the-art satellite measurements, while retaining single-digit relative error ($<10\\%$) against collocated radar measurements.

PDF Details

NeurIPS Conference 2025 Conference Paper

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

Yuanze Lin
Yi-Wen Chen
Yi-Hsuan Tsai
Ronald Clark
Ming-Hsuan Yang

Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across frames. To address this limitation, we propose IllumiCraft, an end-to-end diffusion framework accepting three complementary inputs: (1) high-dynamic-range (HDR) video maps for detailed lighting control; (2) synthetically relit frames with randomized illumination changes (optionally paired with a static background reference image) to provide appearance cues; and (3) 3D point tracks that capture precise 3D geometry information. By integrating the lighting, appearance, and geometry cues within a unified diffusion architecture, IllumiCraft generates temporally coherent videos aligned with user-defined prompts. It supports the background-conditioned and text-conditioned video relighting and provides better fidelity than existing controllable video generation methods.

PDF Details

NeurIPS Conference 2025 Conference Paper

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Andrew M. Bean
Ryan Othniel Kearns
Angelika Romanou
Franziska Sofia Hafner
Harry Mayne
Jan Batzner
Negar Foroutan Eghlidi
Chris Schmitz

Evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment. Reliably measuring abstract and complex phenomena such as safety' and robustness' requires strong construct validity, that is, having measures that represent what matters to the phenomenon. With a team of 29 expert reviewers, we conduct a systematic review of 445 LLM benchmarks from leading conferences in natural language processing and machine learning. Across the reviewed articles, we find patterns related to the measured phenomena, tasks, and scoring metrics which undermine the validity of the resulting claims. To address these shortcomings, we provide eight key recommendations and detailed actionable guidance to researchers and practitioners in developing LLM benchmarks.

PDF Details

TMLR Journal 2025 Journal Article

Metamorphic Forward Adaptation Network: Dynamically Adaptive and Modular Multi-layer Learning

Yu Sun
Vijja Wichitwechkarn
Ronald Clark
Mirko Kovac
Basaran Bahadir Kocer

Back-propagation is a widely used algorithm for training neural networks by adjusting weights based on error gradients. However, back-propagation is biologically implausible with global derivative computation and lacks robustness in long-term dynamic learning. A previously proposed alternative to back-propagation is the Forward-Forward algorithm, which bypasses global gradient dependency and localises computations, making it a more biologically plausible approach. However, Forward-Forward has been evaluated in limited environments, does not yet match back-propagation's performance, and only supports classification, not regression. This research introduces the Metamorphic Forward Adaptation Network (MFAN), using a contrastive learning property as its core, and retaining the layer-wise architecture of the Forward-Forward algorithm. Compared to the Forward-Forward model being limited to discrete classification, MFAN can process discrete and continuous data, showing stability, adaptability, and the ability to handle evolving data. MFAN performs well in continuous data stream scenarios, demonstrating superior adaptability and robustness compared to back-propagation, particularly in tasks requiring dynamic, long-term learning.

PDF Details

ICRA Conference 2023 Conference Paper

Learning Tethered Perching for Aerial Robots

Fabian Hauf
Basaran Bahadir Kocer
Alan Slatter
Hai-Nguyen Nguyen
Oscar Pang
Ronald Clark
Edward Johns
Mirko Kovac

Aerial robots have a wide range of applications, such as collecting data in hard-to-reach areas. This requires the longest possible operation time. However, because currently available commercial batteries have limited specific energy of roughly 300 W h kg -1, a drone's flight time is a bottleneck for sustainable long-term data collection. Inspired by birds in nature, a possible approach to tackle this challenge is to perch drones on trees, and environmental or man-made structures, to save energy whilst in operation. In this paper, we propose an algorithm to automatically generate trajectories for a drone to perch on a tree branch, using the proposed tethered perching mechanism with a pendulum-like structure. This enables a drone to perform an energy-optimised, controlled 180° flip to safely disarm upside down. To fine-tune a set of reachable trajectories, a soft actor critic-based reinforcement algorithm is used. Our experimental results show the feasibility of the set of trajectories with successful perching. Our findings demonstrate that the proposed approach enables energy-efficient landing for long-term data collection tasks.

Details

ICLR Conference 2021 Conference Paper

End-to-End Egospheric Spatial Memory

Daniel Lenton
Stephen James
Ronald Clark
Andrew J. Davison

Spatial memory, or the ability to remember and recall specific locations and objects, is central to autonomous agents' ability to carry out tasks in real environments. However, most existing artificial memory modules are not very adept at storing spatial information. We propose a parameter-free module, Egospheric Spatial Memory (ESM), which encodes the memory in an ego-sphere around the agent, enabling expressive 3D representations. ESM can be trained end-to-end via either imitation or reinforcement learning, and improves both training efficiency and final performance against other memory baselines on both drone and manipulator visuomotor control tasks. The explicit egocentric geometry also enables us to seamlessly combine the learned controller with other non-learned modalities, such as local obstacle avoidance. We further show applications to semantic segmentation on the ScanNet dataset, where ESM naturally combines image-level and map-level inference modalities. Through our broad set of experiments, we show that ESM provides a general computation graph for embodied spatial reasoning, and the module forms a bridge between real-time mapping systems and differentiable memory architectures. Implementation at: https://github.com/ivy-dl/memory.

Details

IROS Conference 2021 Conference Paper

Unsupervised Path Regression Networks

Michal Pándy
Daniel Lenton
Ronald Clark

We demonstrate that challenging shortest path problems can be solved via direct spline regression from a neural network, trained in an unsupervised manner (i. e. without requiring ground truth optimal paths for training). To achieve this, we derive a geometry-dependent optimal cost function whose minima guarantees collision-free solutions. Our method beats state-of-the-art supervised learning baselines for shortest path planning, with a much more scalable training pipeline, and a significant speedup in inference time.

Details

AAAI Conference 2020 Short Paper

Towards Consistent Variational Auto-Encoding (Student Abstract)

Yijing Liu
Shuyu Lin
Ronald Clark

Variational autoencoders (VAEs) have been a successful approach to learning meaningful representations of data in an unsupervised manner. However, suboptimal representations are often learned because the approximate inference model fails to match the true posterior of the generative model, i. e. an inconsistency exists between the learnt inference and generative models. In this paper, we introduce a novel consistency loss that directly requires the encoding of the reconstructed data point to match the encoding of the original data, leading to better representations. Through experiments on MNIST and Fashion MNIST, we demonstrate the existence of the inconsistency in VAE learning and that our method can effectively reduce such inconsistency.

PDF Details

ICRA Conference 2020 Conference Paper

Towards the Probabilistic Fusion of Learned Priors into Standard Pipelines for 3D Reconstruction

Tristan Laidlow
Jan Czarnowski
Andrea Nicastro
Ronald Clark
Stefan Leutenegger

The best way to combine the results of deep learning with standard 3D reconstruction pipelines remains an open problem. While systems that pass the output of traditional multi-view stereo approaches to a network for regularisation or refinement currently seem to get the best results, it may be preferable to treat deep neural networks as separate components whose results can be probabilistically fused into geometry- based systems. Unfortunately, the error models required to do this type of fusion are not well understood, with many different approaches being put forward. Recently, a few systems have achieved good results by having their networks predict probability distributions rather than single values. We propose using this approach to fuse a learned single-view depth prior into a standard 3D reconstruction system. Our system is capable of incrementally producing dense depth maps for a set of keyframes. We train a deep neural network to predict discrete, nonparametric probability distributions for the depth of each pixel from a single image. We then fuse this "probability volume" with another probability volume based on the photometric consistency between subsequent frames and the keyframe image. We argue that combining the probability volumes from these two sources will result in a volume that is better conditioned. To extract depth maps from the volume, we minimise a cost function that includes a regularisation term based on network predicted surface normals and occlusion boundaries. Through a series of experiments, we demonstrate that each of these components improves the overall performance of the system.

Details

NeurIPS Conference 2019 Conference Paper

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

Bo Yang
Jianan Wang
Ronald Clark
Qingyong Hu
Sen Wang
Andrew Markham
Niki Trigoni

We propose a novel, conceptually simple and general framework for instance segmentation on 3D point clouds. Our method, called 3D-BoNet, follows the simple design philosophy of per-point multilayer perceptrons (MLPs). The framework directly regresses 3D bounding boxes for all instances in a point cloud, while simultaneously predicting a point-level mask for each instance. It consists of a backbone network followed by two parallel network branches for 1) bounding box regression and 2) point mask prediction. 3D-BoNet is single-stage, anchor-free and end-to-end trainable. Moreover, it is remarkably computationally efficient as, unlike existing approaches, it does not require any post-processing steps such as non-maximum suppression, feature sampling, clustering or voting. Extensive experiments show that our approach surpasses existing work on both ScanNet and S3DIS datasets while being approximately 10x more computationally efficient. Comprehensive ablation studies demonstrate the effectiveness of our design.

PDF Details

ICRA Conference 2017 Conference Paper

DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks

Sen Wang 0002
Ronald Clark
Hongkai Wen 0001
Niki Trigoni

This paper studies monocular visual odometry (VO) problem. Most of existing VO algorithms are developed under a standard pipeline including feature extraction, feature matching, motion estimation, local optimisation, etc. Although some of them have demonstrated superior performance, they usually need to be carefully designed and specifically fine-tuned to work well in different environments. Some prior knowledge is also required to recover an absolute scale for monocular VO. This paper presents a novel end-to-end framework for monocular VO by using deep Recurrent Convolutional Neural Networks (RCNNs). Since it is trained and deployed in an end-to-end manner, it infers poses directly from a sequence of raw RGB images (videos) without adopting any module in the conventional VO pipeline. Based on the RCNNs, it not only automatically learns effective feature representation for the VO problem through Convolutional Neural Networks, but also implicitly models sequential dynamics and relations using deep Recurrent Neural Networks. Extensive experiments on the KITTI VO dataset show competitive performance to state-of-the-art methods, verifying that the end-to-end Deep Learning technique can be a viable complement to the traditional VO systems.

Details

AAAI Conference 2017 Conference Paper

VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem

Ronald Clark
Sen Wang
Hongkai Wen
Andrew Markham
Niki Trigoni

In this paper we present an on-manifold sequence-tosequence learning approach to motion estimation using visual and inertial sensors. It is to the best of our knowledge the ﬁrst end-to-end trainable method for visual-inertial odometry which performs fusion of the data at an intermediate feature-representation level. Our method has numerous advantages over traditional approaches. Speciﬁcally, it eliminates the need for tedious manual synchronization of the camera and IMU as well as eliminating the need for manual calibration between the IMU and camera. A further advantage is that our model naturally and elegantly incorporates domain speciﬁc information which signiﬁcantly mitigates drift. We show that our approach is competitive with state-of-theart traditional methods when accurate calibration data is available and can be trained to outperform them in the presence of calibration and synchronization errors.

PDF Details

IROS Conference 2016 Conference Paper

Keyframe based large-scale indoor localisation using geomagnetic field and motion pattern

Sen Wang 0002
Hongkai Wen 0001
Ronald Clark
Niki Trigoni

This paper studies indoor localisation problem by using low-cost and pervasive sensors. Most of existing indoor localisation algorithms rely on camera, laser scanner, floor plan or other pre-installed infrastructure to achieve sub-meter or sub-centimetre localisation accuracy. However, in some circumstances these required devices or information may be unavailable or too expensive in terms of cost or deployment. This paper presents a novel keyframe based Pose Graph Simultaneous Localisation and Mapping (SLAM) method, which correlates ambient geomagnetic field with motion pattern and employs low-cost sensors commonly equipped in mobile devices, to provide positioning in both unknown and known environments. Extensive experiments are conducted in large-scale indoor environments to verify that the proposed method can achieve high localisation accuracy similar to state-of-the-arts, such as vision based Google Project Tango.

Details