Author name cluster

Dimitris Samaras

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers

2 author rows

TMLR Journal 2026 Journal Article

GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Snehal Singh Tomar
Alexandros Graikos
Arjun Krishna
Dimitris Samaras
Klaus Mueller

Modern deep learning methods typically treat image sequences as large tensors of sequentially stacked frames. However, is this straightforward representation ideal given the current state-of-the-art (SoTA)? In this work, we address this question in the context of generative models and aim to devise a more effective way of modeling image sequence data. Observing the inefficiencies and bottlenecks of current SoTA image sequence generation methods, we showcase that rather than working with large tensors, we can improve the generation process by factorizing it into first generating the coarse sequence at low resolution and then refining the individual frames at high resolution. We train a generative model solely on grid images comprising subsampled frames. Yet, we learn to generate image sequences, using the strong self-attention mechanism of the Diffusion Transformer (DiT) to capture correlations between frames. In effect, our formulation extends a 2D image generator to operate as a 3D image-sequence generator without introducing any architectural modifications. Subsequently, we super-resolve each frame individually to add the sequence-independent high-resolution details. This approach offers several advantages and can overcome key limitations of the SoTA in this domain. Compared to existing image sequence generation models, our method achieves superior synthesis quality and improved coherence across sequences. It also delivers high-fidelity generation of arbitrary-length sequences and increased efficiency in inference time and training data usage. Furthermore, our straightforward formulation enables our method to generalize effectively across diverse data domains, which typically require additional priors and supervision to model in a generative context. Our method consistently delivers superior quality and offers a $>2\times$ speedup in inference rates across various datasets.

NeurIPS Conference 2025 Conference Paper

Fast constrained sampling in pre-trained diffusion models

Alexandros Graikos
Nebojsa Jojic
Dimitris Samaras

Large denoising diffusion models, such as Stable Diffusion, have been trained on billions of image-caption pairs to perform text-conditioned image generation. As a byproduct of this training, these models have acquired general knowledge about image statistics, which can be useful for other inference tasks. However, when confronted with sampling an image under new constraints, e. g. generating the missing parts of an image, using large pre-trained text-to-image diffusion models is inefficient and often unreliable. Previous approaches either utilized backpropagation through the denoiser network, making them significantly slower and more memory-demanding than simple text-to-image generation, or only enforced the constraint locally, failing to capture critical long-range correlations in the sampled image. In this work, we propose an algorithm that enables fast, high-quality generation under arbitrary constraints. We show that in denoising diffusion models, we can employ an approximation to Newton’s optimization method that allows us to speed up inference and avoid the expensive backpropagation operations. Our approach produces results that rival or surpass the state-of-the-art training-free inference methods while requiring a fraction of the time. We demonstrate the effectiveness of our algorithm under both linear (inpainting, super-resolution) and non-linear (style-guided generation) constraints. An implementation is provided at https: //github. com/cvlab-stonybrook/fast-constrained-sampling.

ICLR Conference 2025 Conference Paper

Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment

Minh-Quan Le
Gaurav Mittal
Tianjian Meng
A S. M. Iftekhar
Vishwas Suryanarayanan
Barun Patra
Dimitris Samaras
Mei Chen

While diffusion models are powerful in generating high-quality, diverse synthetic data for object-centric tasks, existing methods struggle with scene-aware tasks such as Visual Question Answering (VQA) and Human-Object Interaction (HOI) Reasoning, where it is critical to preserve scene attributes in generated images consistent with a multimodal context, i.e. a reference image with accompanying text guidance query. To address this, we introduce **Hummingbird**, the first diffusion-based image generator which, given a multimodal context, generates highly diverse images w.r.t. the reference image while ensuring high fidelity by accurately preserving scene attributes, such as object interactions and spatial relationships from the text guidance. Hummingbird employs a novel Multimodal Context Evaluator that simultaneously optimizes our formulated Global Semantic and Fine-grained Consistency Rewards to ensure generated images preserve the scene attributes of reference images in relation to the text guidance while maintaining diversity. As the first model to address the task of maintaining both diversity and fidelity given a multimodal context, we introduce a new benchmark formulation incorporating MME Perception and Bongard HOI datasets. Benchmark experiments show Hummingbird outperforms all existing methods by achieving superior fidelity while maintaining diversity, validating Hummingbird's potential as a robust multimodal context-aligned image generator in complex visual tasks. Project page: https://roar-ai.github.io/hummingbird

TMLR Journal 2025 Journal Article

LBMamba: Locally Bi-directional Mamba

Jingwei Zhang
Xi Han
Hong Qin
Mahdi S. Hosseini
Dimitris Samaras

Mamba, a State Space Model (SSM) that accelerates training by recasting recurrence as a parallel selective scan, has recently emerged as a linearly-scaling, efficient alternative to self-attention. Because of its unidirectional nature, each state in Mamba only has information of its previous states and is blind to states after. Current Mamba-based computer-vision methods typically overcome this limitation by augmenting Mamba's global forward scan with a global backward scan, forming a bi-directional scan that restores a full receptive field. However, this operation doubles the computational load, eroding much of the efficiency advantage that originally Mamba have. To eliminate this extra scans, we introduce LBMamba, a locally bi-directional SSM block that embeds a lightweight locally backward scan inside the forward selective scan and executes it entirely in per-thread registers. Building on LBMamba, we present LBVim, a scalable vision backbone that alternates scan directions every two layers to recover a global receptive field without extra backward sweeps. We validate the versatility of our approach on both natural images and whole slide images (WSIs). We show that our LBVim constantly offers a superior performance–throughput trade-off. That is under the same throughput, LBVim achieves 0.8% to 1.6% higher top-1 accuracy on the ImageNet-1K classification dataset, 0.6% to 2.7% higher mIoU on the ADE20K semantic segmentation dataset, 0.9% higher AP$^b$ and 1.1% higher AP$^m$ on the COCO detection dataset. Our method serves as a general-purpose enhancement, boosting the accuracy of four SOTA Mamba models, namely VMamba, LocalVim, PlainMamba and Adventurer, by 0.5% to 3.4%. We also integrate LBMamba into the SOTA pathology multiple instance learning (MIL) approach, MambaMIL, which uses single directional scan. Experiments on 3 public WSI classification datasets show that our method achieves a relative improvement of up to 3.06% better AUC, 3.39% better F1, 1.67% better accuracy. Our code is available at https://github.com/cvlab-stonybrook/LBMamba.

NeurIPS Conference 2025 Conference Paper

Low-Rank Head Avatar Personalization with Registers

Sai Tanmay Reddy Chakkera
Aggelina Chatziagapi
Md Moniruzzaman
Chen-Ping Yu
Yi-Hsuan Tsai
Dimitris Samaras

We introduce a novel method for low-rank personalization of a generic model for head avatar generation. Prior work proposes generic models that achieve high-quality face animation by leveraging large-scale datasets of multiple identities. However, such generic models usually fail to synthesize unique identity-specific details, since they learn a general domain prior. To adapt to specific subjects, we find that it is still challenging to capture high-frequency facial details via popular solutions like low-rank adaptation (LoRA). This motivates us to propose a specific architecture, a Register Module, that enhances the performance of LoRA, while requiring only a small number of parameters to adapt to an unseen identity. Our module is applied to intermediate features of a pre-trained model, storing and re-purposing information in a learnable 3D feature space. To demonstrate the efficacy of our personalization method, we collect a dataset of talking videos of individuals with distinctive facial details, such as wrinkles and tattoos. Our approach faithfully captures unseen faces, outperforming existing methods quantitatively and qualitatively.

ICLR Conference 2025 Conference Paper

TopoDiffusionNet: A Topology-aware Diffusion Model

Saumya Gupta
Dimitris Samaras
Chao Chen 0012

Diffusion models excel at creating visually impressive images but often struggle to generate images with a specified topology. The Betti number, which represents the number of structures in an image, is a fundamental measure in topology. Yet, diffusion models fail to satisfy even this basic constraint. This limitation restricts their utility in applications requiring exact control, like robotics and environmental modeling. To address this, we propose TopoDiffusionNet (TDN), a novel approach that enforces diffusion models to maintain the desired topology. We leverage tools from topological data analysis, particularly persistent homology, to extract the topological structures within an image. We then design a topology-based objective function to guide the denoising process, preserving intended structures while suppressing noisy ones. Our experiments across four datasets demonstrate significant improvements in topological accuracy. TDN is the first to integrate topology with diffusion models, opening new avenues of research in this area.

ICLR Conference 2023 Conference Paper

Learning Probabilistic Topological Representations Using Discrete Morse Theory

Xiaoling Hu 0002
Dimitris Samaras
Chao Chen 0012

Accurate delineation of fine-scale structures is a very important yet challenging problem. Existing methods use topological information as an additional training loss, but are ultimately making pixel-wise predictions. In this paper, we propose a novel deep learning based method to learn topological/structural. We use discrete Morse theory and persistent homology to construct a one-parameter family of structures as the topological/structural representation space. Furthermore, we learn a probabilistic model that can perform inference tasks in such a topological/structural representation space. Our method generates true structures rather than pixel-maps, leading to better topological integrity in automatic segmentation tasks. It also facilitates semi-automatic interactive annotation/proofreading via the sampling of structures and structure-aware uncertainty.

NeurIPS Conference 2022 Conference Paper

Diffusion Models as Plug-and-Play Priors

Alexandros Graikos
Nikolay Malkin
Nebojsa Jojic
Dimitris Samaras

We consider the problem of inferring high-dimensional data $x$ in a model that consists of a prior $p(x)$ and an auxiliary differentiable constraint $c(x, y)$ on $x$ given some additional information $y$. In this paper, the prior is an independently trained denoising diffusion generative model. The auxiliary constraint is expected to have a differentiable form, but can come from diverse sources. The possibility of such inference turns diffusion models into plug-and-play modules, thereby allowing a range of potential applications in adapting models to new domains and tasks, such as conditional generation or image segmentation. The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise at each step. Considering many noised versions of $x$ in evaluation of its fitness is a novel search mechanism that may lead to new algorithms for solving combinatorial optimization problems. The code is available at https: //github. com/AlexGraikos/diffusion_priors.

AAAI Conference 2021 Conference Paper

Localization in the Crowd with Topological Constraints

Shahira Abousamra
Minh Hoai
Dimitris Samaras
Chao Chen

We address the problem of crowd localization, i. e. , the prediction of dots corresponding to people in a crowded scene. Due to various challenges, a localization method is prone to spatial semantic errors, i. e. , predicting multiple dots within a same person or collapsing multiple dots in a cluttered region. We propose a topological approach targeting these semantic errors. We introduce a topological constraint that teaches the model to reason about the spatial arrangement of dots. To enforce this constraint, we define a persistence loss based on the theory of persistent homology. The loss compares the topographic landscape of the likelihood map and the topology of the ground truth. Topological reasoning improves the quality of the localization algorithm especially near cluttered regions. On multiple public benchmarks, our method outperforms previous localization methods. Additionally, we demonstrate the potential of our method in improving the performance in the crowd counting task.

AAAI Conference 2021 Conference Paper

Modeling Deep Learning Based Privacy Attacks on Physical Mail

Bingyao Huang
Ruyi Lian
Dimitris Samaras
Haibin Ling

Mail privacy protection aims to prevent unauthorized access to hidden content within an envelope since normal paper envelopes are not as safe as we think. In this paper, for the first time, we show that with a well designed deep learning model, the hidden content may be largely recovered without opening the envelope. We start by modeling deep learning-based privacy attacks on physical mail content as learning the mapping from the camera-captured envelope front face image to the hidden content, then we explicitly model the mapping as a combination of perspective transformation, image dehazing and denoising using a deep convolutional neural network, named Neural-STE (See-Through-Envelope). We show experimentally that hidden content details, such as texture and image structure, can be clearly recovered. Finally, our formulation and model allow us to design envelopes that can counter deep learning-based privacy attacks on physical mail.

ICLR Conference 2021 Conference Paper

Topology-Aware Segmentation Using Discrete Morse Theory

Xiaoling Hu 0002
Yusu Wang 0001
Fuxin Li
Dimitris Samaras
Chao Chen 0012

In the segmentation of fine-scale structures from natural and biomedical images, per-pixel accuracy is not the only metric of concern. Topological correctness, such as vessel connectivity and membrane closure, is crucial for downstream analysis tasks. In this paper, we propose a new approach to train deep image segmentation networks for better topological accuracy. In particular, leveraging the power of discrete Morse theory (DMT), we identify global structures, including 1D skeletons and 2D patches, which are important for topological accuracy. Trained with a novel loss based on these global structures, the network performance is significantly improved especially near topologically challenging locations (such as weak spots of connections and membranes). On diverse datasets, our method achieves superior performance on both the DICE score and topological metrics.

NeurIPS Conference 2020 Conference Paper

Distribution Matching for Crowd Counting

Boyu Wang
Huidong Liu
Dimitris Samaras
Minh Hoai Nguyen

In crowd counting, each training image contains multiple people, where each person is annotated by a dot. Existing crowd counting methods need to use a Gaussian to smooth each annotated dot or to estimate the likelihood of every pixel given the annotated point. In this paper, we show that imposing Gaussians to annotations hurts generalization performance. Instead, we propose to use Distribution Matching for crowd COUNTing (DM-Count). In DM-Count, we use Optimal Transport (OT) to measure the similarity between the normalized predicted density map and the normalized ground truth density map. To stabilize OT computation, we include a Total Variation loss in our model. We show that the generalization error bound of DM-Count is tighter than that of the Gaussian smoothed methods. In terms of Mean Absolute Error, DM-Count outperforms the previous state-of-the-art methods by a large margin on two large-scale counting datasets, UCF-QNRF and NWPU, and achieves the state-of-the-art results on the ShanghaiTech and UCF-CC50 datasets. DM-Count reduced the error of the state-of-the-art published result by approximately 16%. Code is available at https: //github. com/cvlab-stonybrook/DM-Count.

NeurIPS Conference 2019 Conference Paper

Topology-Preserving Deep Image Segmentation

Xiaoling Hu
Fuxin Li
Dimitris Samaras
Chao Chen

Segmentation algorithms are prone to make topological errors on fine-scale struc- tures, e. g. , broken connections. We propose a novel method that learns to segment with correct topology. In particular, we design a continuous-valued loss function that enforces a segmentation to have the same topology as the ground truth, i. e. ,having the same Betti number. The proposed topology-preserving loss function is differentiable and can be incorporated into end-to-end training of a deep neural network. Our method achieves much better performance on the Betti number error, which directly accounts for the topological correctness. It also performs superior on other topology-relevant metrics, e. g. , the Adjusted Rand Index and the Variation of Information, without sacrificing per-pixel accuracy. We illustrate the effectiveness of the proposed method on a broad spectrum of natural and biomedical datasets.

ICML Conference 2018 Conference Paper

A Two-Step Computation of the Exact GAN Wasserstein Distance

Huidong Liu
Xianfeng David Gu
Dimitris Samaras

In this paper, we propose a two-step method to compute the Wasserstein distance in Wasserstein Generative Adversarial Networks (WGANs): 1) The convex part of our objective can be solved by linear programming; 2) The non-convex residual can be approximated by a deep neural network. We theoretically prove that the proposed formulation is equivalent to the discrete Monge-Kantorovich dual formulation. Furthermore, we give the approximation error bound of the Wasserstein distance and the error bound of generalizing the Wasserstein distance from discrete to continuous distributions. Our approach optimizes the exact Wasserstein distance, obviating the need for weight clipping previously used in WGANs. Results on synthetic data show that the our method computes the Wasserstein distance more accurately. Qualitative and quantitative results on MNIST, LSUN and CIFAR-10 datasets show that the proposed method is more efficient than state-of-the-art WGAN methods, and still produces images of comparable quality.

NeurIPS Conference 2018 Conference Paper

Sequence-to-Segment Networks for Segment Detection

Zijun Wei
Boyu Wang
Minh Hoai Nguyen
Jianming Zhang
Zhe Lin
Xiaohui Shen
Radomir Mech
Dimitris Samaras

Detecting segments of interest from an input sequence is a challenging problem which often requires not only good knowledge of individual target segments, but also contextual understanding of the entire input sequence and the relationships between the target segments. To address this problem, we propose the Sequence-to-Segment Network (S$^2$N), a novel end-to-end sequential encoder-decoder architecture. S$^2$N first encodes the input into a sequence of hidden states that progressively capture both local and holistic information. It then employs a novel decoding architecture, called Segment Detection Unit (SDU), that integrates the decoder state and encoder hidden states to detect segments sequentially. During training, we formulate the assignment of predicted segments to ground truth as bipartite matching and use the Earth Mover's Distance to calculate the localization errors. We experiment with S$^2$N on temporal action proposal generation and video summarization and show that S$^2$N achieves state-of-the-art performance on both tasks.

YNIMG Journal 2017 Journal Article

Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

Alexandre Abraham
Michael P. Milham
Adriana Di Martino
R. Cameron Craddock
Dimitris Samaras
Bertrand Thirion
Gael Varoquaux

Resting-state functional Magnetic Resonance Imaging (R-fMRI) holds the promise to reveal functional biomarkers of neuropsychiatric disorders. However, extracting such biomarkers is challenging for complex multi-faceted neuropathologies, such as autism spectrum disorders. Large multi-site datasets increase sample sizes to compensate for this complexity, at the cost of uncontrolled heterogeneity. This heterogeneity raises new challenges, akin to those face in realistic diagnostic applications. Here, we demonstrate the feasibility of inter-site classification of neuropsychiatric status, with an application to the Autism Brain Imaging Data Exchange (ABIDE) database, a large (N=871) multi-site autism dataset. For this purpose, we investigate pipelines that extract the most predictive biomarkers from the data. These R-fMRI pipelines build participant-specific connectomes from functionally-defined brain areas. Connectomes are then compared across participants to learn patterns of connectivity that differentiate typical controls from individuals with autism. We predict this neuropsychiatric status for participants from the same acquisition sites or different, unseen, ones. Good choices of methods for the various steps of the pipeline lead to 67% prediction accuracy on the full ABIDE data, which is significantly better than previously reported results. We perform extensive validation on multiple subsets of the data defined by different inclusion criteria. These enables detailed analysis of the factors contributing to successful connectome-based prediction. First, prediction accuracy improves as we include more subjects, up to the maximum amount of subjects available. Second, the definition of functional brain areas is of paramount importance for biomarker discovery: brain areas extracted from large R-fMRI datasets outperform reference atlases in the classification tasks.

NeurIPS Conference 2016 Conference Paper

Learned Region Sparsity and Diversity Also Predicts Visual Attention

Zijun Wei
Hossein Adeli
Minh Hoai Nguyen
Greg Zelinsky
Dimitris Samaras

Learned region sparsity has achieved state-of-the-art performance in classification tasks by exploiting and integrating a sparse set of local information into global decisions. The underlying mechanism resembles how people sample information from an image with their eye movements when making similar decisions. In this paper we incorporate the biologically plausible mechanism of Inhibition of Return into the learned region sparsity model, thereby imposing diversity on the selected regions. We investigate how these mechanisms of sparsity and diversity relate to visual attention by testing our model on three different types of visual search tasks. We report state-of-the-art results in predicting the locations of human gaze fixations, even though our model is trained only on image-level labels without object location annotations. Notably, the classification performance of the extended model remains the same as the original. This work suggests a new computational perspective on visual attention mechanisms and shows how the inclusion of attention-based mechanisms can improve computer vision techniques.

NeurIPS Conference 2013 Conference Paper

Modeling Clutter Perception using Parametric Proto-object Partitioning

Chen-Ping Yu
Wen-Yu Hua
Dimitris Samaras
Greg Zelinsky

Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of merging superpixels by modeling mixture of Weibull distributions on similarity distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new $\text{90}-$image dataset of realistic scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman's $\rho = 0. 81$, $p < 0. 05$), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features.

YNIMG Journal 2013 Journal Article

Multi-voxel pattern analysis of selective representation of visual working memory in ventral temporal and occipital regions

Xufeng Han
Alexander C. Berg
Hwamee Oh
Dimitris Samaras
Hoi-Chung Leung

While previous results from univariate analysis showed that the activity level of the parahippocampal gyrus (PHG) but not the fusiform gyrus (FG) reflects selective maintenance of the cued picture category, present results from multi-voxel pattern analysis (MVPA) showed that the spatial response patterns of both regions can be used to differentiate the selected picture category in working memory. The ventral temporal and occipital areas including the PHG and FG have been shown to be specialized in perceiving and processing different kinds of visual information, though their role in the representation of visual working memory remains unclear. To test whether the PHG and FG show spatial response patterns that reflect selective maintenance of task-relevant visual working memory in comparison with other posterior association regions, we reanalyzed data from a previous fMRI study of visual working memory with a cue inserted during the delay period of a delayed recognition task. Classification of FG and PHG activation patterns for the selected category (face or scene) during the cue phase was well above chance using classifiers trained with fMRI data from the cue or probe phase. Classification of activity in other temporal and occipital regions for the cued picture category during the cue phase was relatively less consistent even though classification of their activity during the probe recognition was comparable with the FG and PHG. In sum, these findings suggest that the FG and PHG carry information relevant to the cued visual category, and their spatial activation patterns during selective maintenance seem to match those during visual recognition.

ICML Conference 2010 Conference Paper

Multi-Task Learning of Gaussian Graphical Models

Jean Honorio
Dimitris Samaras

NeurIPS Conference 2009 Conference Paper

Sparse and Locally Constant Gaussian Graphical Models

Jean Honorio
Dimitris Samaras
Nikos Paragios
Rita Goldstein
Luis Ortiz

Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically under-sampled and high-dimensional, they often need to be represented with low-complexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through ${\ell}_1$-norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex real-world datasets and demonstrate that it can capture useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets.

ICRA Conference 2006 Conference Paper

Integration of Dependent Bayesian Filters for Robust Tracking

Francesc Moreno-Noguer
Alberto Sanfeliu
Dimitris Samaras

Robotics applications based on computer vision algorithms are highly constrained to indoor environments where conditions may be controlled. The development of robust visual algorithms is necessary for improving the capabilities of many autonomous systems in outdoor and dynamic environments. In particular, this paper proposes a tracking algorithm robust to several artifacts which may be found in real world applications, such as lighting changes, cluttered backgrounds and unexpected target movements. In order to deal with these difficulties the proposed tracking methodology integrates several Bayesian filters. Each filter estimates the state of a particular object feature which is conditionally dependent on another feature estimated by a distinct filter. This dependence provides improved representations of the target, allowing to segment it out from the background of the image. We describe the updating procedure of the Bayesian filters by a 'hypotheses generation and correction' scheme. The main difference with respect to previous approaches is that the dependence between filters is considered during the feature observation, i. e. , into the 'hypotheses correction' stage, instead of considering it when generating the hypotheses. This proves to be much more effective in terms of accuracy and reliability

NeurIPS Conference 2005 Conference Paper

A Computational Model of Eye Movements during Object Class Detection

Wei Zhang
Hyejin Yang
Dimitris Samaras
Gregory Zelinsky

We present a computational model of human eye movements in an ob- ject class detection task. The model combines state-of-the-art computer vision object class detection methods (SIFT features trained using Ad- aBoost) with a biologically plausible model of human eye movement to produce a sequence of simulated ﬁxations, culminating with the acqui- sition of a target. We validated the model by comparing its behavior to the behavior of human observers performing the identical object class detection task (looking for a teddy bear among visually complex non- target objects). We found considerable agreement between the model and human data in multiple eye movement measures, including number of ﬁxations, cumulative probability of ﬁxating the target, and scanpath distance.

NeurIPS Conference 2005 Conference Paper

Modeling Neuronal Interactivity using Dynamic Bayesian Networks

Lei Zhang
Dimitris Samaras
Nelly Alia-Klein
Nora Volkow
Rita Goldstein

Functional Magnetic Resonance Imaging (fMRI) has enabled scientists to look into the active brain. However, interactivity between functional brain regions, is still little studied. In this paper, we contribute a novel framework for modeling the interactions between multiple active brain regions, using Dynamic Bayesian Networks (DBNs) as generative mod- els for brain activation patterns. This framework is applied to modeling of neuronal circuits associated with reward. The novelty of our frame- work from a Machine Learning perspective lies in the use of DBNs to reveal the brain connectivity and interactivity. Such interactivity mod- els which are derived from fMRI data are then validated through a group classiﬁcation task. We employ and compare four different types of DBNs: Parallel Hidden Markov Models, Coupled Hidden Markov Models, Fully-linked Hidden Markov Models and Dynamically Multi- Linked HMMs (DML-HMM). Moreover, we propose and compare two schemes of learning DML-HMMs. Experimental results show that by using DBNs, group classiﬁcation can be performed even if the DBNs are constructed from as few as 5 brain regions. We also demonstrate that, by using the proposed learning algorithms, different DBN structures charac- terize drug addicted subjects vs. control subjects. This ﬁnding provides an independent test for the effect of psychopathology on brain function. In general, we demonstrate that incorporation of computer science prin- ciples into functional neuroimaging clinical studies provides a novel ap- proach for probing human brain function.

NeurIPS Conference 2005 Conference Paper

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Gregory Zelinsky
Wei Zhang
Bing Yu
Xin Chen
Dimitris Samaras

To investigate how top-down (TD) and bottom-up (BU) information is weighted in the guidance of human search behavior, we manipulated the proportions of BU and TD components in a saliency-based model. The model is biologically plausible and implements an artiﬁcial retina and a neuronal population code. The BU component is based on feature- contrast. The TD component is deﬁned by a feature-template match to a stored target representation. We compared the model’s behavior at differ- ent mixtures of TD and BU components to the eye movement behavior of human observers performing the identical search task. We found that a purely TD model provides a much closer match to human behavior than any mixture model using BU information. Only when biological con- straints are removed (e. g. , eliminating the retina) did a BU/TD mixture model begin to approximate human behavior.