Author name cluster

Esther Rolf

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

JMLR Journal 2026 Journal Article

Contrasting Local and Global Modeling with Machine Learning and Satellite Data: A Case Study Estimating Tree Canopy Height in African Savannas

Esther Rolf
Lucia Gordon
Milind Tambe
Andrew Davies

While advances in machine learning with satellite imagery (SatML) are facilitating environmental monitoring at a global scale, developing SatML models that are accurate and useful for local regions remains critical to understanding and acting on an ever-changing planet. As increasing attention and resources are being devoted to training SatML models with global data, it is important to understand when improvements in global models will make it easier to train or fine-tune models that are accurate in specific regions. To explore this question, we design the first study that explicitly contrasts local and global training paradigms for SatML, through a case study of tree canopy height (TCH) mapping in the Karingani Game Reserve, Mozambique. We find that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in our study region. Specifically, small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that we fine-tune using local data. Analyzing these results further, we identify specific points of conflict and synergy between local and global modeling paradigms that can inform future research toward aligning local and global performance objectives in geospatial machine learning. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2026. ( edit, beta )

PDF Details

AAAI Conference 2026 Conference Paper

Mapping on a Budget: Optimizing Spatial Data Collection for ML

Livia Betti
Farooq Sanni
Gnouyaro Z. Sogoyou
Togbe Agbagla
Cullen Molitor
Tamma Carleton
Esther Rolf

In applications across agriculture, ecology, and human development, machine learning with satellite imagery (SatML) is limited by the sparsity of labeled training data. While satellite data cover the globe, labeled training datasets for SatML are often small, spatially clustered, and collected for other purposes (e.g., administrative surveys or field measurements). Despite the pervasiveness of this issue in practice, past SatML research has largely focused on new model architectures and training algorithms to handle scarce training data, rather than modeling data conditions directly. This leaves scientists and policymakers who wish to use SatML for large-scale monitoring uncertain about whether and how to collect additional data to maximize performance. Here, we present the first problem formulation for the optimization of spatial training data in the presence of heterogeneous data collection costs and realistic budget constraints, as well as novel methods for addressing this problem. In experiments simulating different problem settings across three continents and four tasks, our strategies reveal substantial gains from sample optimization. Further experiments delineate settings for which optimized sampling is particularly effective. The problem formulation and methods we introduce are designed to generalize across application domains for SatML; we put special emphasis on a specific problem setting where our coauthors can immediately use our findings to augment clustered agricultural surveys for SatML monitoring in Togo.

PDF Details DOI

AAAI Conference 2025 Conference Paper

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Konstantin Klemmer
Esther Rolf
Caleb Robinson
Lester Mackey
Marc Rußwurm

Geographic information is essential for modeling tasks in fields ranging from ecology to epidemiology. However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). This global, general-purpose geographic location encoder learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates. The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks. In our experiments, we use SatCLIP embeddings to improve performance on nine diverse geospatial prediction tasks including temperature prediction, animal recognition, and population density estimation. Across tasks, SatCLIP consistently outperforms alternative location encoders and shows promise for improving geographic domain adaptation. These results demonstrate the potential of vision-location models to learn meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.

PDF Details DOI

ECAI Conference 2024 Conference Paper

Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents

Lucia Gordon
Esther Rolf
Milind Tambe

Stochastic multi-agent multi-armed bandits typically assume that the rewards from each arm follow a fixed distribution, regardless of which agent pulls the arm. However, in many real-world settings, rewards can depend on the sensitivity of each agent to their environment. In medical screening, disease detection rates can vary by test type; in preference matching, rewards can depend on user preferences; and in environmental sensing, observation quality can vary across sensors. Since past work does not specify how to allocate agents of heterogeneous but known sensitivity of these types in a stochastic bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates information from diverse agents. In doing so, we address the joint challenges of (i) aggregating the rewards, which follow different distributions for each agent-arm pair, and (ii) coordinating the assignments of agents to arms. Min-Width facilitates efficient collaboration among heterogeneous agents, exploiting the known structure in the agents’ reward functions to weight their rewards accordingly. We analyze the regret of Min-Width and conduct pseudo-synthetic and fully synthetic experiments to study the performance of different levels of information sharing. Our results confirm that the gains to modeling agent heterogeneity tend to be greater when the sensitivities are more varied across agents, while combining more information does not always improve performance.

Details

ICLR Conference 2024 Conference Paper

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Marc Rußwurm
Konstantin Klemmer
Esther Rolf
Robin Zbinden
Devis Tuia

Learning representations of geographical space is vital for any machine learning model that integrates geolocated data, spanning application domains such as remote sensing, ecology, or epidemiology. Recent work embeds coordinates using sine and cosine projections based on Double Fourier Sphere (DFS) features. These embeddings assume a rectangular data domain even on global data, which can lead to artifacts, especially at the poles. At the same time, little attention has been paid to the exact design of the neural network architectures with which these functional embeddings are combined. This work proposes a novel location encoder for globally distributed geographic data that combines spherical harmonic basis functions, natively defined on spherical surfaces, with sinusoidal representation networks (SirenNets) that can be interpreted as learned Double Fourier Sphere embedding. We systematically evaluate positional embeddings and neural network architectures across various benchmarks and synthetic evaluation datasets. In contrast to previous approaches that require the combination of both positional encoding and neural networks to learn meaningful representations, we show that both spherical harmonics and sinusoidal representation networks are competitive on their own but set state-of-the-art performances across tasks when combined. The model code and experiments are available at https://github.com/marccoru/locationencoder.

Details

ICML Conference 2024 Conference Paper

Position: Application-Driven Innovation in Machine Learning

David Rolnick
Alán Aspuru-Guzik
Sara Beery
Bistra Dilkina
Priya L. Donti
Marzyeh Ghassemi
Hannah Kerner
Claire Monteleoni

In this position paper, we argue that application-driven research has been systemically under-valued in the machine learning community. As applications of machine learning proliferate, innovative algorithms inspired by specific real-world challenges have become increasingly important. Such work offers the potential for significant impact not merely in domains of application but also in machine learning itself. In this paper, we describe the paradigm of application-driven research in machine learning, contrasting it with the more standard paradigm of methods-driven research. We illustrate the benefits of application-driven machine learning and how this approach can productively synergize with methods-driven work. Despite these benefits, we find that reviewing, hiring, and teaching practices in machine learning often hold back application-driven innovation. We outline how these processes may be improved.

Details

ICML Conference 2024 Conference Paper

Position: Mission Critical - Satellite Data is a Distinct Modality in Machine Learning

Esther Rolf
Konstantin Klemmer
Caleb Robinson
Hannah Kerner

Satellite data has the potential to inspire a seismic shift for machine learning—one in which we rethink existing practices designed for traditional data modalities. As machine learning for satellite data (SatML) gains traction for its real-world impact, our field is at a crossroads. We can either continue applying ill-suited approaches, or we can initiate a new research agenda that centers around the unique characteristics and challenges of satellite data. This position paper argues that satellite data constitutes a distinct modality for machine learning research and that we must recognize it as such to advance the quality and impact of SatML research across theory, methods, and deployment. We outline research directions, critical discussion questions and actionable suggestions to transform SatML from merely an intriguing application area to a dedicated research discipline that helps move the needle on big challenges for machine learning and society.

Details

IJCAI Conference 2023 Conference Paper

Fairness and Representation in Satellite-Based Poverty Maps: Evidence of Urban-Rural Disparities and Their Impacts on Downstream Policy

Emily Aiken
Esther Rolf
Joshua Blumenstock

Poverty maps derived from satellite imagery are increasingly used to inform high-stakes policy decisions, such as the allocation of humanitarian aid and the distribution of government resources. Such poverty maps are typically constructed by training machine learning algorithms on a relatively modest amount of ``ground truth" data from surveys, and then predicting poverty levels in areas where imagery exists but surveys do not. Using survey and satellite data from ten countries, this paper investigates disparities in representation, systematic biases in prediction errors, and fairness concerns in satellite-based poverty mapping across urban and rural lines, and shows how these phenomena affect the validity of policies based on predicted maps. Our findings highlight the importance of careful error and bias analysis before using satellite-based poverty maps in real-world policy decisions.

PDF Details DOI

UAI Conference 2022 Conference Paper

Resolving label uncertainty with implicit posterior models

Esther Rolf
Nikolay Malkin
Alexandros Graikos
Ana Jojic
Caleb Robinson
Nebojsa Jojic

We propose a method for jointly inferring labels across a collection of data samples, where each sample consists of an observation and a prior belief about the label. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs. This formulation unifies various machine learning settings; the weak beliefs can come in the form of noisy or incomplete labels, likelihoods given by a different prediction mechanism on auxiliary input, or common-sense priors reflecting knowledge about the structure of the problem at hand. We demonstrate the proposed algorithms on diverse problems: classification with negative training examples, learning from rankings, weakly and self-supervised aerial imagery segmentation, co-segmentation of video frames, and coarsely supervised text classification.

Details

ICML Conference 2021 Conference Paper

Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data

Esther Rolf
Theodora T. Worledge
Benjamin Recht
Michael I. Jordan

Collecting more diverse and representative training data is often touted as a remedy for the disparate performance of machine learning predictors across subpopulations. However, a precise framework for understanding how dataset properties like diversity affect learning outcomes is largely lacking. By casting data collection as part of the learning process, we demonstrate that diverse representation in training data is key not only to increasing subgroup performances, but also to achieving population-level objectives. Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design

Details

ICML Conference 2020 Conference Paper

Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning

Esther Rolf
Max Simchowitz
Sarah Dean
Lydia T. Liu
Daniel Björkegren
Moritz Hardt
Joshua Blumenstock

While real-world decisions involve many competing objectives, algorithmic decisions are often evaluated with a single objective function. In this paper, we study algorithmic policies which explicitly trade off between a private objective (such as profit) and a public objective (such as social welfare). We analyze a natural class of policies which trace an empirical Pareto frontier based on learned scores, and focus on how such decisions can be made in noisy or data-limited regimes. Our theoretical results characterize the optimal strategies in this class, bound the Pareto errors due to inaccuracies in the scores, and show an equivalence between optimal strategies and a rich class of fairness-constrained profit-maximizing policies. We then present empirical results in two different contexts — online content recommendation and sustainable abalone fisheries — to underscore the generality of our approach to a wide range of practical decisions. Taken together, these results shed light on inherent trade-offs in using machine learning for decisions that impact social welfare.

Details

IJCAI Conference 2019 Conference Paper

Delayed Impact of Fair Machine Learning

Lydia T. Liu
Sarah Dean
Esther Rolf
Max Simchowitz
Moritz Hardt

Static classification has been the predominant focus of the study of fairness in machine learning. While most models do not consider how decisions change populations over time, it is conventional wisdom that fairness criteria promote the long-term well-being of groups they aim to protect. This work studies the interaction of static fairness criteria with temporal indicators of well-being. We show a simple one-step feedback model in which common criteria do not generally promote improvement over time, and may in fact cause harm. Our results highlight the importance of temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.

PDF Details

ICML Conference 2018 Conference Paper

Delayed Impact of Fair Machine Learning

Lydia T. Liu
Sarah Dean
Esther Rolf
Max Simchowitz
Moritz Hardt

Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect. We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not. We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably. Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.

Details