Arrow Research search

Author name cluster

Mirella Lapata

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

ICLR Conference 2025 Conference Paper

Agents' Room: Narrative Generation through Multi-step Collaboration

  • Fantine Huot
  • Reinald Kim Amplayo
  • Jennimaria Palomaki
  • Alice Shoshana Jakobovits
  • Elizabeth Clark
  • Mirella Lapata

Writing compelling fiction is a multifaceted process combining elements such as crafting a plot, developing interesting characters, and using evocative language. While large language models (LLMs) show promise for story writing, they currently rely heavily on intricate prompting, which limits their use. We propose Agents' Room, a generation framework inspired by narrative theory, that decomposes narrative writing into subtasks tackled by specialized agents. To illustrate our method, we introduce Tell Me A Story, a high-quality dataset of complex writing prompts and human-written stories, and a novel evaluation framework designed specifically for assessing long narratives. We show that Agents' Room generates stories that are preferred by expert evaluators over those produced by baseline systems by leveraging collaboration and specialization to decompose the complex story writing task into tractable components. We provide extensive analysis with automated and human-based metrics of the generated output.

TMLR Journal 2025 Journal Article

Uncertainty Quantification in Retrieval Augmented Question Answering

  • Laura Perez-Beltrachini
  • Mirella Lapata

Retrieval augmented Question Answering (QA) helps QA models overcome knowledge gaps by incorporating retrieved evidence, typically a set of passages, alongside the question at test time. Previous studies show that this approach improves QA performance and reduces hallucinations, without, however, assessing whether the retrieved passages are indeed useful at answering correctly. In this work, we propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with. We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods.

NeurIPS Conference 2024 Conference Paper

AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

  • Irina Saparina
  • Mirella Lapata

Practical semantic parsers are expected to understand user utterances and map them to executable programs, even when these are ambiguous. We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests. Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. In each case, the ambiguity persists even when the database context is provided. This is achieved through a novel approach that involves controlled generation of databases from scratch. We benchmark various LLMs on AMBROSIA, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.

ICLR Conference 2023 Conference Paper

Text Summarization with Oracle Expectation

  • Yumo Xu
  • Mirella Lapata

Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document. Since most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy, different labeling algorithms have been proposed to extrapolate oracle extracts for model training. In this work, we identify two flaws with the widely used greedy labeling approach: it delivers suboptimal and deterministic oracles. To alleviate both issues, we propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels. We define a new learning objective for extractive summarization which incorporates learning signals from multiple oracle summaries and prove it is equivalent to estimating the oracle expectation for each document sentence. Without any architectural modifications, the proposed labeling scheme achieves superior performance on a variety of summarization benchmarks across domains and languages, in both supervised and zero-shot settings.

ICML Conference 2022 Conference Paper

Scaling Structured Inference with Randomization

  • Yao Fu
  • John P. Cunningham
  • Mirella Lapata

Deep discrete structured models have seen considerable progress recently, but traditional inference using dynamic programming (DP) typically works with a small number of states (less than hundreds), which severely limits model capacity. At the same time, across machine learning, there is a recent trend of using randomized truncation techniques to accelerate computations involving large sums. Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference (partition, marginal, reparameterization, entropy) and different graph structures (chains, trees, and more general hypergraphs). It is also compatible with automatic differentiation: it can be integrated with neural networks seamlessly and learned with gradient-based optimizers. Our core technique approximates the sum-product by restricting and reweighting DP on a small subset of nodes, which reduces computation by orders of magnitude. We further achieve low bias and variance via Rao-Blackwellization and importance sampling. Experiments over different graphs demonstrate the accuracy and efficiency of our approach. Furthermore, when using RDP for training a structured variational autoencoder with a scaled inference network, we achieve better test likelihood than baselines and successfully prevent posterior collapse.

AAAI Conference 2021 Conference Paper

Exploring Explainable Selection to Control Abstractive Summarization

  • Haonan Wang
  • Yang Gao
  • Yu Bai
  • Mirella Lapata
  • Heyan Huang

Like humans, document summarization models can interpret a document’s contents in a number of ways. Unfortunately, the neural models of today are largely black boxes that provide little explanation of how or why they generated a summary in the way they did. Therefore, to begin prying open the black box and to inject a level of control into the substance of the final summary, we developed a novel select-and-generate framework that focuses on explainability. By revealing the latent centrality and interactions between sentences, along with scores for sentence novelty and relevance, users are given a window into the choices a model is making and an opportunity to guide those choices in a more desirable direction. A novel pair-wise matrix captures the sentence interactions, centrality and attribute scores, and a mask with tunable attribute thresholds allows the user to control which sentences are likely to be included in the extraction. A sentence-deployed attention mechanism in the abstractor ensures the final summary emphasizes the desired content. Additionally, the encoder is adaptable, supporting both Transformer- and BERTbased configurations. In a series of experiments assessed with ROUGE metrics and two human evaluations, ESCA outperformed eight state-of-the-art models on the CNN/DailyMail and NYT50 benchmark datasets.

AAAI Conference 2021 Conference Paper

Movie Summarization via Sparse Graph Construction

  • Pinelopi Papalampidi
  • Frank Keller
  • Mirella Lapata

We summarize full-length movies by creating shorter videos containing their most informative scenes. We explore the hypothesis that a summary can be created by assembling scenes which are turning points (TPs), i. e. , key events in a movie that describe its storyline. We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information1. According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms. The induced graphs are interpretable, displaying different topology for different movie genres.

JAIR Journal 2021 Journal Article

Multi-Document Summarization with Determinantal Point Process Attention

  • Laura Perez-Beltrachini
  • Mirella Lapata

The ability to convey relevant and diverse information is critical in multi-document summarization and yet remains elusive for neural seq-to-seq models whose outputs are often redundant and fail to correctly cover important details. In this work, we propose an attention mechanism which encourages greater focus on relevance and diversity. Attention weights are computed based on (proportional) probabilities given by Determinantal Point Processes (DPPs) defined on the set of content units to be summarized. DPPs have been successfully used in extractive summarisation, here we use them to select relevant and diverse content for neural abstractive summarisation. We integrate DPP-based attention with various seq-to-seq architectures ranging from CNNs to LSTMs, and Transformers. Experimental evaluation shows that our attention mechanism consistently improves summarization and delivers performance comparable with the state-of-the-art on the MultiNews dataset

NeurIPS Conference 2021 Conference Paper

Structured Reordering for Modeling Latent Alignments in Sequence Transduction

  • Bailin Wang
  • Mirella Lapata
  • Ivan Titov

Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i. e. , interpret sentences representing novel combinations of concepts (e. g. , text segments) seen in training. Traditional grammar formalisms excel in such settings by implicitly encoding alignments between input and output segments, but are hard to scale and maintain. Instead of engineering a grammar, we directly model segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. To efficiently explore the large space of alignments, we introduce a reorder-first align-later framework whose central component is a neural reordering module producing separable permutations. We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations, and, thus, enabling end-to-end differentiable training of our model. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks (i. e. , semantic parsing and machine translation).

AAAI Conference 2021 Conference Paper

Unsupervised Opinion Summarization with Content Planning

  • Reinald Kim Amplayo
  • Stefanos Angelidis
  • Mirella Lapata

The recent success of deep learning techniques for abstractive summarization is predicated on the availability of largescale datasets. When summarizing reviews (e. g. , for products or movies), such training data is neither available nor can be easily sourced, motivating the development of methods which rely on synthetic datasets for supervised training. We show that explicitly incorporating content planning in a summarization model not only yields output of higher quality, but also allows the creation of synthetic datasets which are more natural, resembling real world document-summary pairs. Our content plans take the form of aspect and sentiment distributions which we induce from data without access to expensive annotations. Synthetic datasets are created by sampling pseudo-reviews from a Dirichlet distribution parametrized by our content planner, while our model generates summaries based on input reviews and induced content plans. Experimental results on three domains show that our approach outperforms competitive models in generating informative, coherent, and fluent summaries that capture opinion consensus.

AAAI Conference 2019 Conference Paper

Data-to-Text Generation with Content Selection and Planning

  • Ratish Puduppully
  • Li Dong
  • Mirella Lapata

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model1 outperforms strong baselines improving the state-of-the-art on the recently released ROTOWIRE dataset.

JAIR Journal 2019 Journal Article

What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

  • Shashi Narayan
  • Shay B. Cohen
  • Mirella Lapata

We introduce "extreme summarization," a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question "What is the article about?". We argue that extreme summarization, by nature, is not amenable to extractive strategies and requires an abstractive modeling approach. In the hope of driving research on this task further: (a) we collect a real-world, large scale dataset by harvesting online articles from the British Broadcasting Corporation (BBC); and (b) propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans on the extreme summarization dataset.

IJCAI Conference 2017 Conference Paper

Text Rewriting Improves Semantic Role Labeling (Extended Abstract)

  • Kristian Woodsend
  • Mirella Lapata

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.

TIST Journal 2013 Journal Article

An abstractive approach to sentence compression

  • Trevor Cohn
  • Mirella Lapata

In this article we generalize the sentence compression task. Rather than simply shorten a sentence by deleting words or constituents, as in previous work, we rewrite it using additional operations such as substitution, reordering, and insertion. We present an experimental study showing that humans can naturally create abstractive sentences using a variety of rewrite operations, not just deletion. We next create a new corpus that is suited to the abstractive compression task and formulate a discriminative tree-to-tree transduction model that can account for structural and lexical mismatches. The model incorporates a grammar extraction method, uses a language model for coherent output, and can be easily tuned to a wide range of compression-specific loss functions.

IJCAI Conference 2013 Conference Paper

i, Poet: Automatic Chinese Poetry Composition through a Generative Summarization Framework under Constrained Optimization

  • Rui Yan
  • Han Jiang
  • Mirella Lapata
  • Shou-De Lin
  • Xueqiang Lv
  • Xiaoming Li

Part of the long lasting cultural heritage of China is the classical ancient Chinese poems which follow strict formats and complicated linguistic rules. Automatic Chinese poetry composition by programs is considered as a challenging problem in computational linguistics and requires high Artificial Intelligence assistance, and has not been well addressed. In this paper, we formulate the poetry composition task as an optimization problem based on a generative summarization framework under several constraints. Given the user specified writing intents, the system retrieves candidate terms out of a large poem corpus, and then orders these terms to fit into poetry formats, satisfying tonal and rhythm requirements. The optimization process under constraints is conducted via iterative term substitutions till convergence, and outputs the subset with the highest utility as the generated poem. For experiments, we perform generation on large datasets of 61, 960 classic poems from Tang and Song Dynasty of China. A comprehensive evaluation, using both human judgments and ROUGE scores, has demonstrated the effectiveness of our proposed approach.

AAAI Conference 2011 Conference Paper

WikiSimple: Automatic Simplification of Wikipedia Articles

  • Kristian Woodsend
  • Mirella Lapata

Text simplification aims to rewrite text into simpler versions and thus make information accessible to a broader audience (e. g. , non-native speakers, children, and individuals with language impairments). In this paper, we propose a model that simplifies documents automatically while selecting their most important content and rewriting them in a simpler style. We learn content selection rules from same-topic Wikipedia articles written in the main encyclopedia and its Simple English variant. We also use the revision histories of Simple Wikipedia articles to learn a quasi-synchronous grammar of simplification rewrite rules. Based on an integer linear programming formulation, we develop a joint model where preferences based on content and style are optimized simultaneously. Experiments on simplifying main Wikipedia articles show that our method significantly reduces the reading difficulty, while still capturing the important content.

IS Journal 2008 Journal Article

Natural Language Processing and the Web

  • Dragomir Radev
  • Mirella Lapata

This special issue focuses on applications that innovatively use the Web and Web-scale document collections to create useful resources or applications that let end users navigate the Web more easily. This article is part of a special issue on Natural Language Processing and the Web.

IJCAI Conference 2007 Conference Paper

  • Roberto Navigli
  • Mirella Lapata

Word sense disambiguation (WSD) has been a long-standing research objective for natural language processing. In this paper we are concerned with developing graph-based unsupervised algorithms for alleviating the data requirements for large scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most "important" node among the set of graph nodes representing its senses. We propose a variety of measures that analyze the connectivity of graph structures, thereby identifying the most relevant word senses. We assess their performance on standard datasets, and show that the best measures perform comparably to state-of-the-art.

IJCAI Conference 2005 Conference Paper

Automatic Evaluation of Text Coherence: Models and Representations

  • Mirella Lapata
  • Regina

This paper investigates the automatic evaluation of text coherence for machine-generated texts. We introduce a fully-automatic, linguistically rich model of local coherence that correlates with human judgments. Our modeling approach relies on shallow text properties and is relatively inexpensive. We present experimental results that assess the predictive power of various discourse representations proposed in the linguistic literature. Our results demonstrate that certain models capture complementary aspects of coherence and thus can be combined to improve performance.