Arrow Research search

Author name cluster

Caleb Robinson

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

Machine Learning for Sustainable Rice Production: Region-Scale Monitoring of Water-Saving Practices in Punjab, India

  • Ando Shah
  • Rajveer Singh
  • Akram Zaytar
  • Girmaw Abebe Tadesse
  • Caleb Robinson
  • Negar Tafti
  • Stephen A. Wood
  • Rahul Dodhia

Rice cultivation supplies half the world's population with staple food, while also being a major driver of freshwater depletion--consuming roughly a quarter of global freshwater--and accounting for ~48% of greenhouse gas emissions from croplands. In regions like Punjab, India, where groundwater levels are plummeting at 41.6 cm/year, adopting water-saving rice farming practices is critical. Direct-Seeded Rice (DSR) and Alternate Wetting and Drying (AWD) can cut irrigation water use by 20–40% without hurting yields, yet lack of spatial data on adoption impedes effective adaptation policy and climate action. We present a machine learning framework to bridge this data gap by monitoring sustainable rice farming at scale. In collaboration with agronomy experts and a large-scale farmer training program, we obtained ground-truth data from ~1,400 fields across Punjab. Leveraging this partnership, we developed a novel dimensional classification approach that decouples sowing and irrigation practices, achieving F1 scores of 0.8 and 0.74 respectively, solely employing Sentinel-1 satellite imagery. Explainability analysis reveals that DSR classification is robust while AWD classification depends primarily on planting schedule differences, as Sentinel-1's 12-day revisit frequency cannot capture the higher frequency irrigation cycles characteristic of AWD practices. Applying this model across 3 million fields reveals spatial heterogeneity in adoption at the state level, highlighting gaps and opportunities for policy targeting. Our district-level adoption rates correlate well with government estimates (Spearman's=0.69 and Rank Biased Overlap=0.77). This study provides policymakers and sustainability programs a powerful tool to track practice adoption, inform targeted interventions, and drive data-driven policies for water conservation and climate mitigation at regional scale.

AAAI Conference 2025 Conference Paper

Fields of The World: A Machine Learning Benchmark Dataset for Global Agricultural Field Boundary Segmentation

  • Hannah Kerner
  • Snehal Chaudhari
  • Aninda Ghosh
  • Caleb Robinson
  • Adeel Ahmad
  • Eddie Choi
  • Nathan Jacobs
  • Chris Holmes

Crop field boundaries are foundational datasets for agricultural monitoring and assessments but are expensive to collect manually. Machine learning (ML) methods for automatically extracting field boundaries from remotely sensed images could help realize the demand for these datasets at a global scale. However, current ML methods for field instance segmentation lack sufficient geographic coverage, accuracy, and generalization capabilities. Further, research on improving ML methods is restricted by the lack of labeled datasets representing the diversity of global agricultural fields. We present Fields of The World (FTW)---a novel ML benchmark dataset for agricultural field instance segmentation spanning 24 countries on four continents (Europe, Africa, Asia, and South America). FTW is an order of magnitude larger than previous datasets with 70,462 samples, each containing instance and semantic segmentation masks paired with multi-date, multi-spectral Sentinel-2 satellite images. We provide results from baseline models for the new FTW benchmark, show that models trained on FTW have better zero-shot and fine-tuning performance in held-out countries than models that aren't pre-trained with diverse datasets, and show positive qualitative zero-shot results of FTW models in a real-world scenario -- running on Sentinel-2 scenes over Ethiopia.

AAAI Conference 2025 Conference Paper

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

  • Konstantin Klemmer
  • Esther Rolf
  • Caleb Robinson
  • Lester Mackey
  • Marc Rußwurm

Geographic information is essential for modeling tasks in fields ranging from ecology to epidemiology. However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). This global, general-purpose geographic location encoder learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates. The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks. In our experiments, we use SatCLIP embeddings to improve performance on nine diverse geospatial prediction tasks including temperature prediction, animal recognition, and population density estimation. Across tasks, SatCLIP consistently outperforms alternative location encoders and shows promise for improving geographic domain adaptation. These results demonstrate the potential of vision-location models to learn meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.

ICML Conference 2024 Conference Paper

Position: Mission Critical - Satellite Data is a Distinct Modality in Machine Learning

  • Esther Rolf
  • Konstantin Klemmer
  • Caleb Robinson
  • Hannah Kerner

Satellite data has the potential to inspire a seismic shift for machine learning—one in which we rethink existing practices designed for traditional data modalities. As machine learning for satellite data (SatML) gains traction for its real-world impact, our field is at a crossroads. We can either continue applying ill-suited approaches, or we can initiate a new research agenda that centers around the unique characteristics and challenges of satellite data. This position paper argues that satellite data constitutes a distinct modality for machine learning research and that we must recognize it as such to advance the quality and impact of SatML research across theory, methods, and deployment. We outline research directions, critical discussion questions and actionable suggestions to transform SatML from merely an intriguing application area to a dedicated research discipline that helps move the needle on big challenges for machine learning and society.

NeurIPS Conference 2023 Conference Paper

SSL4EO-L: Datasets and Foundation Models for Landsat Imagery

  • Adam Stewart
  • Nils Lehmann
  • Isaac Corley
  • Yi Wang
  • Yi-Chia Chang
  • Nassim Ait Ait Ali Braham
  • Shradha Sehgal
  • Caleb Robinson

The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis due to the prevalence of small labeled datasets and lack of foundation models. In this paper, we introduce SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites (including 3 sensors and 2 product levels) and the largest Landsat dataset in history (5M image patches). Additionally, we modernize and re-release the L7 Irish and L8 Biome cloud detection datasets, and introduce the first ML benchmark datasets for Landsats 4–5 TM and Landsat 7 ETM+ SR. Finally, we pre-train the first foundation models for Landsat imagery using SSL4EO-L and evaluate their performance on multiple semantic segmentation tasks. All datasets and model weights are available via the TorchGeo library, making reproducibility and experimentation easy, and enabling scientific advancements in the burgeoning field of remote sensing for a multitude of downstream applications.

UAI Conference 2022 Conference Paper

Resolving label uncertainty with implicit posterior models

  • Esther Rolf
  • Nikolay Malkin
  • Alexandros Graikos
  • Ana Jojic
  • Caleb Robinson
  • Nebojsa Jojic

We propose a method for jointly inferring labels across a collection of data samples, where each sample consists of an observation and a prior belief about the label. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs. This formulation unifies various machine learning settings; the weak beliefs can come in the form of noisy or incomplete labels, likelihoods given by a different prediction mechanism on auxiliary input, or common-sense priors reflecting knowledge about the structure of the problem at hand. We demonstrate the proposed algorithms on diverse problems: classification with negative training examples, learning from rankings, weakly and self-supervised aerial imagery segmentation, co-segmentation of video frames, and coarsely supervised text classification.

AAAI Conference 2020 Conference Paper

Human-Machine Collaboration for Fast Land Cover Mapping

  • Caleb Robinson
  • Anthony Ortiz
  • Kolya Malkin
  • Blake Elias
  • Andi Peng
  • Dan Morris
  • Bistra Dilkina
  • Nebojsa Jojic

We propose incorporating human labelers in a model finetuning system that provides immediate user feedback. In our framework, human labelers can interactively query model predictions on unlabeled data, choose which data to label, and see the resulting effect on the model’s predictions. This bi-directional feedback loop allows humans to learn how the model responds to new data. We implement this framework for fine-tuning high-resolution land cover segmentation models and compare human-selected points to points selected using standard active learning methods. Specifically, we fine-tune a deep neural network – trained to segment highresolution aerial imagery into different land cover classes in Maryland, USA – to a new spatial area in New York, USA using both our human-in-the-loop method and traditional active learning methods. The tight loop in our proposed system turns the algorithm and the human operator into a hybrid system that can produce land cover maps of large areas more efficiently than the traditional workflows. Our framework has applications in machine learning settings where there is a practically limitless supply of unlabeled data, of which only a small fraction can feasibly be labeled through human efforts, such as geospatial and medical image-based applications.