Author name cluster

Daniel Rubin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

NeurIPS Conference 2023 Conference Paper

RaLEs: a Benchmark for Radiology Language Evaluations

Juanma Zambrano Chaves
Nandita Bhaskhar
Maayane Attias
Jean-Benoit Delbrouck
Daniel Rubin
Andreas Loening
Curtis Langlotz
Akshay Chaudhari

The radiology report is the main form of communication between radiologists and other clinicians. Prior work in natural language processing in radiology reports has shown the value of developing methods tailored for individual tasks such as identifying reports with critical results or disease detection. Meanwhile, English and biomedical natural language understanding benchmarks such as the General Language Understanding and Evaluation as well as Biomedical Language Understanding and Reasoning Benchmark have motivated the development of models that can be easily adapted to address many tasks in those domains. Here, we characterize the radiology report as a distinct domain and introduce RaLEs, the Radiology Language Evaluations, as a benchmark for natural language understanding and generation in radiology. RaLEs is comprised of seven natural language understanding and generation evaluations including the extraction of anatomical and disease entities and their relations, procedure selection, and report summarization. We characterize the performance of models designed for the general, biomedical, clinical and radiology domains across these tasks. We find that advances in the general and biomedical domains do not necessarily translate to radiology, and that improved models from the general domain can perform comparably to smaller clinical-specific models. The limited performance of existing pre-trained models on RaLEs highlights the opportunity to improve domain-specific self-supervised models for natural language processing in radiology. We propose RaLEs as a benchmark to promote and track the development of such domain-specific radiology language models.

PDF Details

AAAI Conference 2020 Short Paper

Cancer Treatment Classification with Electronic Medical Health Records (Student Abstract)

Jiaming Zeng
Imon Banerjee
Michael Gensheimer
Daniel Rubin

We built a natural language processing (NLP) language model that can be used to extract cancer treatment information using structured and unstructured electronic medical records (EMR). Our work appears to be the ﬁrst that combines EMR and NLP for treatment identiﬁcation.

PDF Details

JBHI Journal 2020 Journal Article

Natural Language Generation Model for Mammography Reports Simulation

Assaf Hoogi
Arjun Mishra
Francisco Gimenez
Jeffrey Dong
Daniel Rubin

Extending the size of labeled corpora of medical reports is a major step towards a successful training of machine learning algorithms. Simulating new text reports is a key solution for reports augmentation, which extends the cohort size. However, text generation in the medical domain is challenging because it needs to preserve both content and style that are typical for real reports, without risking the patients' privacy. In this paper, we present a conditioned LSTM-RNN architecture for simulating realistic mammography reports. We evaluated the performance by analyzing the characteristics of the simulated reports and classifying them into benign and malignant classes. An average classification AUC was calculated over two distinct test sets. A qualitative analysis was also performed in which a masked radiologist classified 0. 75 of the simulated reports as real reports, showing that both the style and content of the simulated reports were similar to real reports. Finally, we compared our RNN-LSTM generative model with Markov Random Fields. The RNN-LSTM provided significantly better and more stable performance than MRFs (p <; 0. 01, Wilcoxon).

Details DOI

NeurIPS Conference 2017 Conference Paper

Inferring Generative Model Structure with Static Analysis

Paroma Varma
Bryan He
Payal Bajaj
Nishith Khandwala
Imon Banerjee
Daniel Rubin
Christopher Ré

Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning n-th degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3. 81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3. 07 accuracy points when heuristics are used to label radiology data without ground truth labels.

PDF Details