Author name cluster

Ehsan Adeli

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Discovering Latent Graphs with GFlowNets for Diverse Conditional Image Generation

Bailey Trang Nguyen
Parham Saremi
Alan Wang
Fangrui Huang
Zahra TehraniNasab
Amar Kumar
Tal Arbel
Fei-Fei Li

Capturing diversity is crucial in conditional and prompt-based image generation, particularly when conditions contain uncertainty that can lead to multiple plausible outputs. To generate diverse images reflecting this diversity, traditional methods often modify random seeds, making it difficult to discern meaningful differences between samples, or diversify the input prompt, which is limited in verbally interpretable diversity. We propose \modelnamenospace, a novel conditional image generation framework, applicable to any pretrained conditional generative model, that addresses inherent condition/prompt uncertainty and generates diverse plausible images. \modelname is based on a simple yet effective idea: decomposing the input condition into diverse latent representations, each capturing an aspect of the uncertainty and generating a distinct image. First, we integrate a latent graph, parameterized by Generative Flow Networks (GFlowNets), into the prompt representation computation. Second, leveraging GFlowNets' advanced graph sampling capabilities to capture uncertainty and output diverse trajectories over the graph, we produce multiple trajectories that collectively represent the input condition, leading to diverse condition representations and corresponding output images. Evaluations on natural image and medical image datasets demonstrate \modelnamenospace’s improvement in both diversity and fidelity across image synthesis, image generation, and counterfactual generation tasks.

PDF Details

JBHI Journal 2025 Journal Article

Guest Editorial: Applications of Intelligent Environments to Health

Miguel J. Hornos
Ehsan Adeli
Víctor M. Zamudio

Details DOI

AAAI Conference 2024 Conference Paper

Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection

Soopil Kim
Sion An
Philip Chikontwe
Myeongkyun Kang
Ehsan Adeli
Kilian M. Pohl
Sang Hyun Park

Logical anomalies (LA) refer to data violating underlying logical constraints e.g., the quantity, arrangement, or composition of components within an image. Detecting accurately such anomalies requires models to reason about various component types through segmentation. However, curation of pixel-level annotations for semantic segmentation is both time-consuming and expensive. Although there are some prior few-shot or unsupervised co-part segmentation algorithms, they often fail on images with industrial object. These images have components with similar textures and shapes, and a precise differentiation proves challenging. In this study, we introduce a novel component segmentation model for LA detection that leverages a few labeled samples and unlabeled images sharing logical constraints. To ensure consistent segmentation across unlabeled images, we employ a histogram matching loss in conjunction with an entropy loss. As segmentation predictions play a crucial role, we propose to enhance both local and global sample validity detection by capturing key aspects from visual semantics via three memory banks: class histograms, component composition embeddings and patch-level representations. For effective LA detection, we propose an adaptive scaling strategy to standardize anomaly scores from different memory banks in inference. Extensive experiments on the public benchmark MVTec LOCO AD reveal our method achieves 98.1% AUROC in LA detection vs. 89.6% from competing methods.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

OccFusion: Rendering Occluded Humans with Generative Diffusion Priors

Adam Sun
Tiange Xiang
Scott Delp
Li Fei-Fei
Ehsan Adeli

Existing human rendering methods require every part of the human to be fully visible throughout the input video. However, this assumption does not hold in real-life settings where obstructions are common, resulting in only partial visibility of the human. Considering this, we present OccFusion, an approach that utilizes efficient 3D Gaussian splatting supervised by pretrained 2D diffusion models for efficient and high-fidelity human rendering. We propose a pipeline consisting of three stages. In the Initialization stage, complete human masks are generated from partial visibility masks. In the Optimization stage, 3D human Gaussians are optimized with additional supervisions by Score-Distillation Sampling (SDS) to create a complete geometry of the human. Finally, in the Refinement stage, in-context inpainting is designed to further improve rendering quality on the less observed human body parts. We evaluate OccFusion on ZJU-MoCap and challenging OcMotion sequences and found that it achieves state-of-the-art performance in the rendering of occluded humans.

PDF Details DOI

AIIM Journal 2024 Journal Article

Vision-based estimation of fatigue and engagement in cognitive training sessions

Yanchen Wang
Adam Turnbull
Yunlong Xu
Kathi Heffner
Feng Vankee Lin
Ehsan Adeli

Computerized cognitive training (CCT) is a scalable, well-tolerated intervention that has promise for slowing cognitive decline. The effectiveness of CCT is often affected by a lack of effective engagement. Mental fatigue is a the primary factor for compromising effective engagement in CCT, particularly in older adults at risk for dementia. There is a need for scalable, automated measures that can constantly monitor and reliably detect mental fatigue during CCT. Here, we develop and validate a novel Recurrent Video Transformer (RVT) method for monitoring real-time mental fatigue in older adults with mild cognitive impairment using their video-recorded facial gestures during CCT. The RVT model achieved the highest balanced accuracy (79. 58%) and precision (0. 82) compared to the prior models for binary and multi-class classification of mental fatigue. We also validated our model by significantly relating to reaction time across CCT tasks ( Wald χ 2 = 5. 16, p = 0. 023 ). By leveraging dynamic temporal information, the RVT model demonstrates the potential to accurately measure real-time mental fatigue, laying the foundation for future CCT research aiming to enhance effective engagement by timely prevention of mental fatigue.

Details DOI

NeurIPS Conference 2022 Conference Paper

MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing

Zelun Luo
Zane Durante
Linden Li
Wanze Xie
Ruochen Liu
Emily Jin
Zhuoyi Huang
Lun Yu Li

Video-language models (VLMs), large models pre-trained on numerous but noisy video-text pairs from the internet, have revolutionized activity recognition through their remarkable generalization and open-vocabulary capabilities. While complex human activities are often hierarchical and compositional, most existing tasks for evaluating VLMs focus only on high-level video understanding, making it difficult to accurately assess and interpret the ability of VLMs to understand complex and fine-grained human activities. Inspired by the recently proposed MOMA framework, we define activity graphs as a single universal representation of human activities that encompasses video understanding at the activity, sub-activity, and atomic action level. We redefine activity parsing as the overarching task of activity graph generation, requiring understanding human activities across all three levels. To facilitate the evaluation of models on activity parsing, we introduce MOMA-LRG (Multi-Object Multi-Actor Language-Refined Graphs), a large dataset of complex human activities with activity graph annotations that can be readily transformed into natural language sentences. Lastly, we present a model-agnostic and lightweight approach to adapting and evaluating VLMs by incorporating structured knowledge from activity graphs into VLMs, addressing the individual limitations of language and graphical models. We demonstrate strong performance on few-shot activity parsing, and our framework is intended to foster future research in the joint modeling of videos, graphs, and language.

PDF Details

JBHI Journal 2021 Journal Article

Longitudinal Pooling & Consistency Regularization to Model Disease Progression From MRIs

Jiahong Ouyang
Qingyu Zhao
Edith V. Sullivan
Adolf Pfefferbaum
Susan F. Tapert
Ehsan Adeli
Kilian M. Pohl

Many neurological diseases are characterized by gradual deterioration of brain structure andfunction. Large longitudinal MRI datasets have revealed such deterioration, in part, by applying machine and deep learning to predict diagnosis. A popular approach is to apply Convolutional Neural Networks (CNN) to extract informative features from each visit of the longitudinal MRI and then use those features to classify each visit via Recurrent Neural Networks (RNNs). Such modeling neglects the progressive nature of the disease, which may result in clinically implausible classifications across visits. To avoid this issue, we propose to combine features across visits by coupling feature extraction with a novel longitudinal pooling layer and enforce consistency of the classification across visits in line with disease progression. We evaluate the proposed method on the longitudinal structural MRIs from three neuroimaging datasets: Alzheimer's Disease Neuroimaging Initiative (ADNI, $N=404$ ), a dataset composed of 274 normal controls and 329 patients with Alcohol Use Disorder (AUD), and 255 youths from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA). In allthree experiments our method is superior to other widely used approaches for longitudinal classification thus making a unique contribution towards more accurate tracking of the impact of conditions on the brain. The code is available at https://github.com/ouyangjiahong/longitudinal-pooling.

Details DOI

NeurIPS Conference 2021 Conference Paper

MOMA: Multi-Object Multi-Actor Activity Parsing

Zelun Luo
Wanze Xie
Siddharth Kapoor
Yiyun Liang
Michael Cooper
Juan Carlos Niebles
Ehsan Adeli
Fei-Fei Li

Complex activities often involve multiple humans utilizing different objects to complete actions (e. g. , in healthcare settings, physicians, nurses, and patients interact with each other and various medical devices). Recognizing activities poses a challenge that requires a detailed understanding of actors' roles, objects' affordances, and their associated relationships. Furthermore, these purposeful activities are composed of multiple achievable steps, including sub-activities and atomic actions, which jointly define a hierarchy of action parts. This paper introduces Activity Parsing as the overarching task of temporal segmentation and classification of activities, sub-activities, atomic actions, along with an instance-level understanding of actors, objects, and their relationships in videos. Involving multiple entities (actors and objects), we argue that traditional pair-wise relationships, often used in scene or action graphs, do not appropriately represent the dynamics between them. Hence, we introduce Action Hypergraph, a spatial-temporal graph containing hyperedges (i. e. , edges with higher-order relationships), as a new representation. In addition, we introduce Multi-Object Multi-Actor (MOMA), the first benchmark and dataset dedicated to activity parsing. Lastly, to parse a video, we propose the HyperGraph Activity Parsing (HGAP) network, which outperforms several baselines, including those based on regular graphs and raw video data.

PDF Details

AAAI Conference 2020 Conference Paper

Adversarial Cross-Domain Action Recognition with Co-Attention

Boxiao Pan
Zhangjie Cao
Ehsan Adeli
Juan Carlos Niebles

Action recognition has been a widely studied topic with a heavy focus on supervised learning involving sufﬁcient labeled videos. However, the problem of cross-domain action recognition, where training and testing videos are drawn from different underlying distributions, remains largely underexplored. Previous methods directly employ techniques for cross-domain image recognition, which tend to suffer from the severe temporal misalignment problem. This paper proposes a Temporal Co-attention Network (TCoN), which matches the distributions of temporally aligned action features between source and target domains using a novel crossdomain co-attention mechanism. Experimental results on three cross-domain action recognition datasets demonstrate that TCoN improves both previous single-domain and crossdomain methods signiﬁcantly under the cross-domain setting.

PDF Details

YNIMG Journal 2020 Journal Article

Deep learning identifies morphological determinants of sex differences in the pre-adolescent brain

Ehsan Adeli
Qingyu Zhao
Natalie M. Zahr
Aimee Goldstone
Adolf Pfefferbaum
Edith V. Sullivan
Kilian M. Pohl

The application of data-driven deep learning to identify sex differences in developing brain structures of pre-adolescents has heretofore not been accomplished. Here, the approach identifies sex differences by analyzing the minimally processed MRIs of the first 8144 participants (age 9 and 10 years) recruited by the Adolescent Brain Cognitive Development (ABCD) study. The identified pattern accounted for confounding factors (i. e. , head size, age, puberty development, socioeconomic status) and comprised cerebellar (corpus medullare, lobules III, IV/V, and VI) and subcortical (pallidum, amygdala, hippocampus, parahippocampus, insula, putamen) structures. While these have been individually linked to expressing sex differences, a novel discovery was that their grouping accurately predicted the sex in individual pre-adolescents. Another novelty was relating differences specific to the cerebellum to pubertal development. Finally, we found that reducing the pattern to a single score not only accurately predicted sex but also correlated with cognitive behavior linked to working memory. The predictive power of this score and the constellation of identified brain structures provide evidence for sex differences in pre-adolescent neurodevelopment and may augment understanding of sex-specific vulnerability or resilience to psychiatric disorders and presage sex-linked learning disabilities.

Details DOI

AAAI Conference 2019 Conference Paper

Difficulty-Aware Attention Network with Confidence Learning for Medical Image Segmentation

Dong Nie
Li Wang
Lei Xiang
Sihang Zhou
Ehsan Adeli
Dinggang Shen

Medical image segmentation is a key step for various applications, such as image-guided radiation therapy and diagnosis. Recently, deep neural networks provided promising solutions for automatic image segmentation; however, they often perform good on regular samples (i. e. , easy-to-segment samples), since the datasets are dominated by easy and regular samples. For medical images, due to huge inter-subject variations or disease-specific effects on subjects, there exist several difficult-to-segment cases that are often overlooked by the previous works. To address this challenge, we propose a difficulty-aware deep segmentation network with confidence learning for end-to-end segmentation. The proposed framework has two main contributions: 1) Besides the segmentation network, we also propose a fully convolutional adversarial network for confidence learning to provide voxel-wise and region-wise confidence information for the segmentation network. We relax the adversarial learning to confidence learning by decreasing the priority of adversarial learning, so that we can avoid the training imbalance between generator and discriminator. 2) We propose a difficulty-aware attention mechanism to properly handle hard samples or hard regions considering structural information, which may go beyond the shortcomings of focal loss. We further propose a fusion module to selectively fuse the concatenated feature maps in encoder-decoder architectures. Experimental results on clinical and challenge datasets show that our proposed network can achieve state-of-the-art segmentation accuracy. Further analysis also indicates that each individual component of our proposed network contributes to the overall performance improvement.

PDF Details

YNIMG Journal 2019 Journal Article

Multi-task prediction of infant cognitive scores from longitudinal incomplete neuroimaging data

Ehsan Adeli
Yu Meng
Gang Li
Weili Lin
Dinggang Shen

Early postnatal brain undergoes a stunning period of development. Over the past few years, research on dynamic infant brain development has received increased attention, exhibiting how important the early stages of a child's life are in terms of brain development. To precisely chart the early brain developmental trajectories, longitudinal studies with data acquired over a long-enough period of infants' early life is essential. However, in practice, missing data from different time point(s) during the data gathering procedure is often inevitable. This leads to incomplete set of longitudinal data, which poses a major challenge for such studies. In this paper, prediction of multiple future cognitive scores with incomplete longitudinal imaging data is modeled into a multi-task machine learning framework. To efficiently learn this model, we account for selection of informative features (i. e. , neuroimaging morphometric measurements for different time points), while preserving the structural information and the interrelation between these multiple cognitive scores. Several experiments are conducted on a carefully acquired in-house dataset, and the results affirm that we can predict the cognitive scores measured at the age of four years old, using the imaging data of earlier time points, as early as 24 months of age, with a reasonable performance (i. e. , root mean square error of 0. 18).

Details DOI

YNIMG Journal 2018 Journal Article

Chained regularization for identifying brain patterns specific to HIV infection

Ehsan Adeli
Dongjin Kwon
Qingyu Zhao
Adolf Pfefferbaum
Natalie M. Zahr
Edith V. Sullivan
Kilian M. Pohl

Human Immunodeficiency Virus (HIV) infection continues to have major adverse public health and clinical consequences despite the effectiveness of combination Antiretroviral Therapy (cART) in reducing HIV viral load and improving immune function. As successfully treated individuals with HIV infection age, their cognition declines faster than reported for normal aging. This phenomenon underlines the importance of improving long-term care, which requires a better understanding of the impact of HIV on the brain. In this paper, automated identification of patients and brain regions affected by HIV infection are modeled as a classification problem, whose solution is determined in two steps within our proposed Chained-Regularization framework. The first step focuses on selecting the HIV pattern (i. e. , the most informative constellation of brain region measurements for distinguishing HIV infected subjects from healthy controls) by constraining the search for the optimal parameter setting of the classifier via group sparsity ( ℓ 2, 1 -norm). The second step improves classification accuracy by constraining the parameterization with respect to the selected measurements and the Euclidean regularization ( ℓ 2 -norm). When applied to the cortical and subcortical structural Magnetic Resonance Images (MRI) measurements of 65 controls and 65 HIV infected individuals, this approach is more accurate in distinguishing the two cohorts than more common models. Finally, the brain regions of the identified HIV pattern concur with the HIV literature that uses traditional group analysis models.

Details DOI

AAAI Conference 2018 Conference Paper

Multi-Layer Multi-View Classification for Alzheimer’s Disease Diagnosis

Changqing Zhang
Ehsan Adeli
Tao Zhou
Xiaobo Chen
Dinggang Shen

In this paper, we propose a novel multi-view learning method for Alzheimer’s Disease (AD) diagnosis, using neuroimaging and genetics data. Generally, there are several major challenges associated with traditional classiﬁcation methods on multi-source imaging and genetics data. First, the correlation between the extracted imaging features and class labels is generally complex, which often makes the traditional linear models ineffective. Second, medical data may be collected from different sources (i. e. , multiple modalities of neuroimaging data, clinical scores or genetics measurements), therefore, how to effectively exploit the complementarity among multiple views is of great importance. In this paper, we propose a Multi-Layer Multi-View Classiﬁcation (ML-MVC) approach, which regards the multi-view input as the ﬁrst layer, and constructs a latent representation to explore the complex correlation between the features and class labels. This captures the high-order complementarity among different views, as we exploit the underlying information with a low-rank tensor regularization. Intrinsically, our formulation elegantly explores the nonlinear correlation together with complementarity among different views, and thus improves the accuracy of classiﬁcation. Finally, the minimization problem is solved by the Alternating Direction Method of Multipliers (ADMM). Experimental results on Alzheimers Disease Neuroimaging Initiative (ADNI) data sets validate the effectiveness of our proposed method.

PDF Details

YNIMG Journal 2016 Journal Article

Joint feature-sample selection and robust diagnosis of Parkinson's disease from MRI data

Ehsan Adeli
Feng Shi
Le An
Chong-Yaw Wee
Guorong Wu
Tao Wang
Dinggang Shen

Parkinson's disease (PD) is an overwhelming neurodegenerative disorder caused by deterioration of a neurotransmitter, known as dopamine. Lack of this chemical messenger impairs several brain regions and yields various motor and non-motor symptoms. Incidence of PD is predicted to double in the next two decades, which urges more research to focus on its early diagnosis and treatment. In this paper, we propose an approach to diagnose PD using magnetic resonance imaging (MRI) data. Specifically, we first introduce a joint feature-sample selection (JFSS) method for selecting an optimal subset of samples and features, to learn a reliable diagnosis model. The proposed JFSS model effectively discards poor samples and irrelevant features. As a result, the selected features play an important role in PD characterization, which will help identify the most relevant and critical imaging biomarkers for PD. Then, a robust classification framework is proposed to simultaneously de-noise the selected subset of features and samples, and learn a classification model. Our model can also de-noise testing samples based on the cleaned training data. Unlike many previous works that perform de-noising in an unsupervised manner, we perform supervised de-noising for both training and testing data, thus boosting the diagnostic accuracy. Experimental results on both synthetic and publicly available PD datasets show promising results. To evaluate the proposed method, we use the popular Parkinson's progression markers initiative (PPMI) database. Our results indicate that the proposed method can differentiate between PD and normal control (NC), and outperforms the competing methods by a relatively large margin. It is noteworthy to mention that our proposed framework can also be used for diagnosis of other brain disorders. To show this, we have also conducted experiments on the widely-used ADNI database. The obtained results indicate that our proposed method can identify the imaging biomarkers and diagnose the disease with favorable accuracies compared to the baseline methods.

Details DOI