Author name cluster

Roberto Navigli

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers

2 author rows

AAAI Conference 2026 Conference Paper

Is Word Sense Disambiguation Dead in the LLM Era?

Roberto Navigli

Word Sense Disambiguation (WSD) has been a central challenge since the earliest proposals for Machine Translation (MT), most famously Weaver's 1949 memorandum. Classical systems treated WSD as an explicit task, grounded in lexical resources and annotated data. Recently, however, Large Language Models (LLMs) have blurred the boundary between disambiguation and general language understanding, leading some to suggest that WSD might be obsolete. This paper surveys the role of WSD in the LLM era, drawing on recent studies of encoder-based sense separation and disambiguation, and decoder-based definition selection and generation, as well as multilingual evaluation. Closed-source instruction-tuned LLMs now achieve performance comparable to specialized WSD systems, yet systematic weaknesses remain: non-predominant senses are often misclassified and disambiguation biases in MT persist. We argue that WSD is not "dead" but redefined as a diagnostic lens for assessing lexical-semantic competence, robustness, and interpretability in LLMs.

PDF Details DOI

AAAI Conference 2022 Conference Paper

BabelNet Meaning Representation: A Fully Semantic Formalism to Overcome Language Barriers

Roberto Navigli
Rexhina Blloshmi
Abelardo Carlos Martínez Lorenzo

Conceptual representations of meaning have long been the general focus of Artificial Intelligence (AI) towards the fundamental goal of machine understanding, with innumerable efforts made in Knowledge Representation, Speech and Natural Language Processing, Computer Vision, inter alia. Even today, at the core of Natural Language Understanding lies the task of Semantic Parsing, the objective of which is to convert natural sentences into machine-readable representations. Through this paper, we aim to revamp the historical dream of AI, by putting forward a novel, all-embracing, fully semantic meaning representation, that goes beyond the many existing formalisms. Indeed, we tackle their key limits by fully abstracting text into meaning and introducing languageindependent concepts and semantic relations, in order to obtain an interlingual representation. Our proposal aims to overcome the language barrier, and connect not only texts across languages, but also images, videos, speech and sound, and logical formulas, across many fields of AI.

AAAI Conference 2022 Conference Paper

STEPS: Semantic Typing of Event Processes with a Sequence-to-Sequence Approach

Sveva Pepe
Edoardo Barba
Rexhina Blloshmi
Roberto Navigli

Enabling computers to comprehend the intent of human actions by processing language is one of the fundamental goals of Natural Language Understanding. An emerging task in this context is that of free-form event process typing, which aims at understanding the overall goal of a protagonist in terms of an action and an object, given a sequence of events. This task was initially treated as a learning-to-rank problem by exploiting the similarity between processes and action/object textual definitions. However, this approach appears to be overly complex, binds the output types to a fixed inventory for possible word definitions and, moreover, leaves space for further enhancements as regards performance. In this paper, we advance the field by reformulating the free-form event process typing task as a sequence generation problem and put forward STEPS, an end-to-end approach for producing user intent in terms of actions and objects only, dispensing with the need for their definitions. In addition to this, we eliminate several dataset constraints set by previous works, while at the same time significantly outperforming them. We release the data and software at https: //github. com/SapienzaNLP/steps.

AAAI Conference 2022 Conference Paper

Visual Definition Modeling: Challenging Vision & Language Models to Define Words and Objects

Bianca Scarlini
Tommaso Pasini
Roberto Navigli

Architectures that model language and vision together have received much attention in recent years. Nonetheless, most tasks in this field focus on end-to-end applications without providing insights on whether it is the underlying semantics of visual objects or words that is captured. In this paper we draw on the established Definition Modeling paradigm and enhance it by grounding, for the first time, textual definitions to visual representations. We name this new task Visual Definition Modeling and put forward DEMETER and DIONYSUS, two benchmarks where, given an image as context, models have to generate a textual definition for a target being either i) a word that describes the image, or ii) an object patch therein. To measure the difficulty of our tasks we finetuned six different baselines and analyzed their performances, which show that a text-only encoder-decoder model is more effective than models pretrained for handling inputs of both modalities concurrently. This demonstrates the complexity of our benchmarks and encourages more research on text generation conditioned on multimodal inputs. The datasets for both benchmarks are available at https: //github. com/SapienzaNLP/visual-definitionmodeling as well as the code to reproduce our models.

IJCAI Conference 2021 Conference Paper

ALaSca: an Automated approach for Large-Scale Lexical Substitution

Caterina Lacerra
Tommaso Pasini
Rocco Tripodi
Roberto Navigli

The lexical substitution task aims at finding suitable replacements for words in context. It has proved to be useful in several areas, such as word sense induction and text simplification, as well as in more practical applications such as writing-assistant tools. However, the paucity of annotated data has forced researchers to apply mainly unsupervised approaches, limiting the applicability of large pre-trained models and thus hampering the potential benefits of supervised approaches to the task. In this paper, we mitigate this issue by proposing ALaSca, a novel approach to automatically creating large-scale datasets for English lexical substitution. ALaSca allows examples to be produced for potentially any word in a language vocabulary and to cover most of the meanings it lists. Thanks to this, we can unleash the full potential of neural architectures and finetune them on the lexical substitution task. Indeed, when using our data, a transformer-based model performs substantially better than when using manually annotated data only. We release ALaSca at https: //sapienzanlp. github. io/alasca/.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba
Luigi Procopio
Caterina Lacerra
Tommaso Pasini
Roberto Navigli

Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i. e. , generating a usage example given one or more words along with their definitions, has not yet been investigated. In this work, we introduce the novel task of Exemplification Modeling (ExMod), along with a sequence-to-sequence architecture and a training procedure for it. Starting from a set of (word, definition) pairs, our approach is capable of automatically generating high-quality sentences which express the requested semantics. As a result, we can drive the creation of sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. Human annotators agree that the sentences generated are as fluent and semantically-coherent with the input definitions as the sentences in manually-annotated corpora. Indeed, when employed as training data for Word Sense Disambiguation, our examples enable the current state of the art to be outperformed, and higher results to be achieved than when using gold-standard datasets only. We release the pretrained model, the dataset and the software at https: //github. com/SapienzaNLP/exmod.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Rexhina Blloshmi
Simone Conia
Rocco Tripodi
Roberto Navigli

Despite the recent great success of the sequence-to-sequence paradigm in Natural Language Processing, the majority of current studies in Semantic Role Labeling (SRL) still frame the problem as a sequence labeling task. In this paper we go against the flow and propose GSRL (Generating Senses and RoLes), the first sequence-to-sequence model for end-to-end SRL. Our approach benefits from recently-proposed decoder-side pretraining techniques to generate both sense and role labels for all the predicates in an input sentence at once, in an end-to-end fashion. Evaluated on standard gold benchmarks, GSRL achieves state-of-the-art results in both dependency- and span-based English SRL, proving empirically that our simple generation-based model can learn to produce complex predicate-argument structures. Finally, we propose a framework for evaluating the robustness of an SRL model in a variety of synthetic low-resource scenarios which can aid human annotators in the creation of better, more diverse, and more challenging gold datasets. We release GSRL at github. com/SapienzaNLP/gsrl.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

MultiMirror: Neural Cross-lingual Word Alignment for Multilingual Word Sense Disambiguation

Luigi Procopio
Edoardo Barba
Federico Martelli
Roberto Navigli

Word Sense Disambiguation (WSD), i. e. , the task of assigning senses to words in context, has seen a surge of interest with the advent of neural models and a considerable increase in performance up to 80% F1 in English. However, when considering other languages, the availability of training data is limited, which hampers scaling WSD to many languages. To address this issue, we put forward MultiMirror, a sense projection approach for multilingual WSD based on a novel neural discriminative model for word alignment: given as input a pair of parallel sentences, our model -- trained with a low number of instances -- is capable of jointly aligning, at the same time, all source and target tokens with each other, surpassing its competitors across several language combinations. We demonstrate that projecting senses from English by leveraging the alignments produced by our model leads a simple mBERT-powered classifier to achieve a new state of the art on established WSD datasets in French, German, Italian, Spanish and Japanese. We release our software and all our datasets at https: //github. com/SapienzaNLP/multimirror.

PDF Details DOI

AAAI Conference 2021 Conference Paper

One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation without a Complex Pipeline

Michele Bevilacqua
Rexhina Blloshmi
Roberto Navigli

In Text-to-AMR parsing, current state-of-the-art semantic parsers use cumbersome pipelines integrating several different modules or components, and exploit graph recategorization, i. e. , a set of content-specific heuristics that are developed on the basis of the training set. However, the generalizability of graph recategorization in an out-of-distribution setting is unclear. In contrast, state-of-the-art AMR-to-Text generation, which can be seen as the inverse to parsing, is based on simpler seq2seq. In this paper, we cast Text-to- AMR and AMR-to-Text as a symmetric transduction task and show that by devising a careful graph linearization and extending a pretrained encoder-decoder model, it is possible to obtain state-of-the-art performances in both tasks using the very same seq2seq approach, i. e. , SPRING (Symmetric PaRsIng aNd Generation). Our model does not require complex pipelines, nor heuristics built on heavy assumptions. In fact, we drop the need for graph recategorization, showing that this technique is actually harmful outside of the standard benchmark. Finally, we outperform the previous state of the art on the English AMR 2. 0 dataset by a large margin: on Text-to-AMR we obtain an improvement of 3. 6 Smatch points, while on AMR-to-Text we outperform the state of the art by 11. 2 BLEU points. We release the software at github. com/SapienzaNLP/spring.

IJCAI Conference 2021 Conference Paper

Recent Trends in Word Sense Disambiguation: A Survey

Michele Bevilacqua
Tommaso Pasini
Alessandro Raganato
Roberto Navigli

Word Sense Disambiguation (WSD) aims at making explicit the semantics of a word in context by identifying the most suitable meaning from a predefined sense inventory. Recent breakthroughs in representation learning have fueled intensive WSD research, resulting in considerable performance improvements, breaching the 80% glass ceiling set by the inter-annotator agreement. In this survey, we provide an extensive overview of current advances in WSD, describing the state of the art in terms of i) resources for the task, i. e. , sense inventories and reference datasets for training and testing, as well as ii) automatic disambiguation approaches, detailing their peculiarities, strengths and weaknesses. Finally, we highlight the current limitations of the task itself, but also point out recent trends that could help expand the scope and applicability of WSD, setting up new promising directions for the future.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Ten Years of BabelNet: A Survey

Roberto Navigli
Michele Bevilacqua
Simone Conia
Dario Montagnini
Francesco Cecconi

The intelligent manipulation of symbolic knowledge has been a long-sought goal of AI. However, when it comes to Natural Language Processing (NLP), symbols have to be mapped to words and phrases, which are not only ambiguous but also language-specific: multilinguality is indeed a desirable property for NLP systems, and one which enables the generalization of tasks where multiple languages need to be dealt with, without translating text. In this paper we survey BabelNet, a popular wide-coverage lexical-semantic knowledge resource obtained by merging heterogeneous sources into a unified semantic network that helps to scale tasks and applications to hundreds of languages. Over its ten years of existence, thanks to its promise to interconnect languages and resources in structured form, BabelNet has been employed in countless ways and directions. We first introduce the BabelNet model, its components and statistics, and then overview its successful use in a wide range of tasks in NLP as well as in other fields of AI.

PDF Details DOI

AAAI Conference 2021 Conference Paper

XL-WSD: An Extra-Large and Cross-Lingual Evaluation Framework for Word Sense Disambiguation

Tommaso Pasini
Alessandro Raganato
Roberto Navigli

Transformer-based architectures brought a breeze of change to Word Sense Disambiguation (WSD), improving models’ performances by a large margin. The fast development of new approaches has been further encouraged by a well-framed evaluation suite for English, which has allowed their performances to be kept track of and compared fairly. However, other languages have remained largely unexplored, as testing data are available for a few languages only and the evaluation setting is rather matted. In this paper, we untangle this situation by proposing XL-WSD, a cross-lingual evaluation benchmark for the WSD task featuring sense-annotated development and test sets in 18 languages from six different linguistic families, together with language-specific silver training data. We leverage XL-WSD datasets to conduct an extensive evaluation of neural and knowledge-based approaches, including the most recent multilingual language models. Results show that the zero-shot knowledge transfer across languages is a promising research direction within the WSD field, especially when considering low-resourced languages where large pretrained multilingual models still perform poorly. We make the evaluation suite and the code for performing the experiments available at https: //sapienzanlp. github. io/xl-wsd/.

AAAI Conference 2020 Conference Paper

CSI: A Coarse Sense Inventory for 85% Word Sense Disambiguation

Caterina Lacerra
Michele Bevilacqua
Tommaso Pasini
Roberto Navigli

Word Sense Disambiguation (WSD) is the task of associating a word in context with one of its meanings. While many works in the past have focused on raising the state of the art, none has even come close to achieving an F-score in the 80% ballpark when using WordNet as its sense inventory. We contend that one of the main reasons for this failure is the excessively ﬁne granularity of this inventory, resulting in senses that are hard to differentiate between, even for an experienced human annotator. In this paper we cope with this long-standing problem by introducing Coarse Sense Inventory (CSI), obtained by linking WordNet concepts to a new set of 45 labels. The results show that the coarse granularity of CSI leads a WSD model to achieve 85. 9% F1, while maintaining a high expressive power. Our set of labels also exhibits ease of use in tagging and a descriptiveness that other coarse inventories lack, as demonstrated in two annotation tasks which we performed. Moreover, a few-shot evaluation proves that the class-based nature of CSI allows the model to generalise over unseen or under-represented words.

IJCAI Conference 2020 Conference Paper

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings

Agostina Calabrese
Michele Bevilacqua
Roberto Navigli

The problem of grounding language in vision is increasingly attracting scholarly efforts. As of now, however, most of the approaches have been limited to word embeddings, which are not capable of handling polysemous words. This is mainly due to the limited coverage of the available semantically-annotated datasets, hence forcing research to rely on alternative technologies (i. e. , image search engines). To address this issue, we introduce EViLBERT, an approach which is able to perform image classification over an open set of concepts, both concrete and non-concrete. Our approach is based on the recently introduced Vision-Language Pretraining (VLP) model, and builds upon a manually-annotated dataset of concept-image pairs. We use our technique to clean up the image-to-concept mapping that is provided within a multilingual knowledge base, resulting in over 258, 000 images associated with 42, 500 concepts. We show that our VLP-based model can be used to create multimodal sense embeddings starting from our automatically-created dataset. In turn, we also show that these multimodal embeddings improve the performance of a Word Sense Disambiguation architecture over a strong unimodal baseline. We release code, dataset and embeddings at http: //babelpic. org.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation

Edoardo Barba
Luigi Procopio
Niccolò Campolungo
Tommaso Pasini
Roberto Navigli

The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting the power of supervised systems when applied to multilingual Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon a novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and the multilingual information enclosed in a knowledge base, projects sense labels from a high-resource language, i. e. , English, to lower-resourced ones. Backed by several experiments, we provide empirical evidence that our automatically created datasets are of a higher quality than those generated by other competitors and lead a supervised model to achieve state-of-the-art performances in all multilingual Word Sense Disambiguation tasks. We make our datasets available for research purposes at https: //github. com/SapienzaNLP/mulan.

PDF Details DOI

AAAI Conference 2020 Conference Paper

SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation

Bianca Scarlini
Tommaso Pasini
Roberto Navigli

Contextual representations of words derived by neural language models have proven to effectively encode the subtle distinctions that might occur between different meanings of the same word. However, these representations are not tied to a semantic network, hence they leave the word meanings implicit and thereby neglect the information that can be derived from the knowledge base itself. In this paper, we propose SENSEMBERT, a knowledge-based approach that brings together the expressive power of language modelling and the vast amount of knowledge contained in a semantic network to produce high-quality latent semantic representations of word meanings in multiple languages. Our vectors lie in a space comparable with that of contextualized word embeddings, thus allowing a word occurrence to be easily linked to its meaning by applying a simple nearest neighbour approach. We show that, whilst not relying on manual semantic annotations, SENSEMBERT is able to either achieve or surpass state-of-the-art results attained by most of the supervised neural approaches on the English Word Sense Disambiguation task. When scaling to other languages, our representations prove to be equally effective as their English counterpart and outperform the existing state of the art on all the Word Sense Disambiguation multilingual datasets. The embeddings are released in ﬁve different languages at http: //sensembert. org.

AIJ Journal 2020 Journal Article

Train-O-Matic: Supervised Word Sense Disambiguation with no (manual) effort

Tommaso Pasini
Roberto Navigli

IJCAI Conference 2018 Conference Paper

Natural Language Understanding: Instructions for (Present and Future) Use

Roberto Navigli

In this paper I look at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do we expect a machine should be able to understand? and what are the key dimensions that require the attention of researchers to make this dream come true?

AAAI Conference 2018 Conference Paper

Two Knowledge-based Methods for High-Performance Sense Distribution Learning

Tommaso Pasini
Roberto Navigli

Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-speciﬁc test sets. Our sense distributions are available at http: //trainomatic. org.

IJCAI Conference 2016 Conference Paper

Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia

Alessandro Raganato
Claudio Delli Bovi
Roberto Navigli

The hyperlink structure of Wikipedia constitutes a key resource for many Natural Language Processing tasks and applications, as it provides several million semantic annotations of entities in context. Yet only a small fraction of mentions across the entire Wikipedia corpus is linked. In this paper we present the automatic construction and evaluation of a Semantically Enriched Wikipedia in which the overall number of linked mentions has been more than tripled solely by exploiting the structure of Wikipedia itself and the wide-coverage sense inventory of BabelNet. As a result we obtain a sense-annotated corpus with more than 200 million annotations of over 4 million different concepts and named entities. We then show that our corpus leads to competitive results on multiple tasks, such as Entity Linking and Word Similarity.

AAAI Conference 2016 Conference Paper

ExTaSem! Extending, Taxonomizing and Semantifying Domain Terminologies

Luis Espinosa-Anke
Horacio Saggion
Francesco Ronzano
Roberto Navigli

We introduce EXTASEM! , a novel approach for the automatic learning of lexical taxonomies from domain terminologies. First, we exploit a very large semantic network to collect thousands of in-domain textual deﬁnitions. Second, we extract (hyponym, hypernym) pairs from each definition with a CRF-based algorithm trained on manuallyvalidated data. Finally, we introduce a graph induction procedure which constructs a full-ﬂedged taxonomy where each edge is weighted according to its domain pertinence. EX- TASEM! achieves state-of-the-art results in the following taxonomy evaluation experiments: (1) Hypernym discovery, (2) Reconstructing gold standard taxonomies, and (3) Taxonomy quality according to structural measures. We release weighted taxonomies for six domains for the use and scrutiny of the community.

AIJ Journal 2016 Journal Article

MultiWiBi: The multilingual Wikipedia bitaxonomy project

Tiziano Flati
Daniele Vannella
Tommaso Pasini
Roberto Navigli

AIJ Journal 2016 Journal Article

Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

José Camacho-Collados
Mohammad Taher Pilehvar
Roberto Navigli

AIJ Journal 2015 Journal Article

From senses to texts: An all-in-one graph-based approach for measuring semantic similarity

Mohammad Taher Pilehvar
Roberto Navigli

AIJ Journal 2013 Journal Article

Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Eduard Hovy
Roberto Navigli
Simone Paolo Ponzetto

AIJ Journal 2013 Journal Article

Editorial

Eduard Hovy
Roberto Navigli
Simone Paolo Ponzetto

IJCAI Conference 2013 Conference Paper

Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm

Andrea Moro
Roberto Navigli

In this paper we present an approach aimed at enriching the Open Information Extraction paradigm with semantic relation ontologization by integrating syntactic and semantic features into its workflow. To achieve this goal, we combine deep syntactic analysis and distributional semantics using a shortest path kernel method and soft clustering. The output of our system is a set of automatically discovered and ontologized semantic relations.

PDF Details DOI

IJCAI Conference 2013 Conference Paper

The CQC Algorithm: Cycling in Graphs to Semantically Enrich and Enhance a Bilingual Dictionary (Extended Abstract)

Tiziano Flati
Roberto Navigli

Bilingual machine-readable dictionaries are knowledge resources useful in many automatic tasks. However, compared to monolingual computational lexicons like WordNet, bilingual dictionaries typically provide a lower amount of structured information such as lexical and semantic relations, and often do not cover the entire range of possible translations for a word of interest. In this paper we present Cycles and Quasi-Cycles (CQC), a novel algorithm for the automated disambiguation of ambiguous translations in the lexical entries of a bilingual machine-readable dictionary.

PDF Details DOI

AIJ Journal 2012 Journal Article

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Roberto Navigli
Simone Paolo Ponzetto

AAAI Conference 2012 Conference Paper

BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness

Roberto Navigli
Simone Paolo Ponzetto

We present a knowledge-rich approach to computing semantic relatedness which exploits the joint contribution of different languages. Our approach is based on the lexicon and semantic knowledge of a wide-coverage multilingual knowledge base, which is used to compute semantic graphs in a variety of languages. Complementary information from these graphs is then combined to produce a ‘core’ graph where disambiguated translations are connected by means of strong semantic relations. We evaluate our approach on standard monolingual and bilingual datasets, and show that: i) we outperform a graph-based approach which does not use multilinguality in a joint way; ii) we achieve uniformly competitive results for both resource-rich and resource-poor languages.

IJCAI Conference 2011 Conference Paper

A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch

Roberto Navigli
Paola Velardi
Stefano Faralli

In this paper we present a graph-based approach aimed at learning a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike many taxonomy learning approaches in the literature, our novel algorithm learns both concepts and relations entirely from scratch via the automated extraction of terms, definitions and hypernyms. This results in a very dense, cyclic and possibly disconnected hypernym graph. The algorithm then induces a taxonomy from the graph. Our experiments show that we obtain high-quality results, both when building brand-new taxonomies and when reconstructing WordNet sub-hierarchies.

PDF Details DOI

IJCAI Conference 2009 Conference Paper

Simone Paolo Ponzetto
Roberto Navigli

We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet synsets and using this semantic information to restructure a taxonomy automatically generated from the Wikipedia system of categories. We evaluate against a manual gold standard and show that both category disambiguation and taxonomy restructuring perform with high accuracy. Besides, we assess these methods on automatically generated datasets and show that we are able to effectively enrich WordNet with a large number of instances from Wikipedia. Our approach produces an integrated resource, thus bringing together the ﬁne-grained classiﬁcation of instances in Wikipedia and a wellstructured top-level taxonomy from WordNet.

ECAI Conference 2008 Conference Paper

Content-Based Social Network Analysis

Paola Velardi
Roberto Navigli
Alessandro Cucchiarelli
Mirco Curzi

Relationships among actors in traditional social network analysis are modelled as a function of the quantity of relations (co-authorships, business relations, friendship, etc.). In contrast, within a business, social or research community, network analysts are interested in the communicative content exchanged by the community members, not merely in the number of relationships. In order to meet this need, this paper presents a novel social network model, in which the actors are not simply represented through the intensity of their mutual relationships, but also through the analysis and evolution of their shared interests. Text mining and clustering techniques are used to capture the content of communication and to identify the most popular topics.

IJCAI Conference 2007 Conference Paper

Roberto Navigli
Mirella Lapata

Word sense disambiguation (WSD) has been a long-standing research objective for natural language processing. In this paper we are concerned with developing graph-based unsupervised algorithms for alleviating the data requirements for large scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most "important" node among the set of graph nodes representing its senses. We propose a variety of measures that analyze the connectivity of graph structures, thereby identifying the most relevant word senses. We assess their performance on standard datasets, and show that the best measures perform comparably to state-of-the-art.

IJCAI Conference 2007 Conference Paper

Paola Velardi
Roberto Navigli
Micha
euml; l Petit

This paper describes a methodology to semi-automatically acquire a taxonomy of terms and term definitions in a specific research domain. The taxonomy is then used for semantic search and indexing of a knowledge base of scientific competences, called Knowledge Map. The KMap is a system to support research collaborations and sharing of results within and beyond a European Network of Excellence. The methodology is general and can be applied to model any web community - starting from the documents shared and exchanged among the community members - and to use this model for improving accessibility of data and knowledge repositories.