Daniel Aliaga Papers

AAAI Conference 2026 Conference Paper

Building Instance Segmentation for Dense Urban Settlements

Adnan Firoze
Raymond A. Yeh
Daniel Aliaga

About 25% of the world’s population live in informal urban settlements containing densely packed buildings (approximately 8,000 houses per square-km) which do not lend themselves favorably to state-of-the-art satellite-based building segmentation methods due to, for example, occlusion, vegetation, shadows and low resolution. To address these challenges, we introduce a novel instance segmentation and counting approach for dense buildings. Our system first extracts a conservative set of tentative building center points using a deep network for jumpstarting a Segment Anything Model 2 (SAM2) module to produce an initial over-segmentation. Second, we use a graph neural network to refine the over-segmented regions into polygons representing accurate building masks. Experiments show that our approach achieves higher accuracy in instance segmentation and counting especially in challenging densely packed building areas in Brazil, Mexico, India, Pakistan, and Kenya, for instance.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Hang Hua
Ziyun Zeng
Yizhi Song
Yunlong Tang
Liu He
Daniel Aliaga
Wei Xiong
Jiebo Luo

Recent multimodal image generators such as GPT-4o, Gemini 2. 0 Flash, and Gemini 2. 5 Pro excel at following complex instructions, editing images and maintaining concept consistency. However, they are still evaluated by disjoint toolkits: text-to-image (T2I) benchmarks that lacks multi-modal conditioning, and customized image generation benchmarks that overlook compositional semantics and common knowledge. We propose MMIG-Bench, a comprehensive M ulti- M odal I mage G eneration Bench mark that unifies these tasks by pairing 4, 850 richly annotated text prompts with 1, 750 multi-view reference images across 380 subjects, spanning humans, animals, objects, and artistic styles. MMIG-Bench is equipped with a three-level evaluation framework: (1) low-level metrics for visual artifacts and identity preservation of objects; (2) novel Aspect Matching Score (AMS): a VQA-based mid-level metric that delivers fine-grained prompt-image alignment and shows strong correlation with human judgments; and (3) high-level metrics for aesthetics and human preference. Using MMIG-Bench, we benchmark 17 state-of-the-art models, including Gemini 2. 5 Pro, FLUX, DreamBooth, and IP-Adapter, and validate our metrics with 32k human ratings, yielding in-depth insights into architecture and data design.

PDF Details

Possible papers

Building Instance Segmentation for Dense Urban Settlements

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models