Arrow Research search

Author name cluster

Ted Pedersen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
1 author row

Possible papers

13

AAAI Conference 2006 System Paper

An End-to-End Supervised Target-Word Sense Disambiguation System

  • Mahesh Joshi
  • Ted Pedersen

We present an extensible supervised Target-Word Sense Disambiguation system that leverages upon GATE (General Architecture for Text Engineering), NSP (Ngram Statistics Package) and WEKA (Waikato Environment for Knowledge Analysis) to present an end-toend solution that integrates feature identification, feature extraction, preprocessing and classification.

AAAI Conference 2005 System Paper

Identifying Similar Words and Contexts in Natural Language with SenseClusters

  • Ted Pedersen

SenseClusters is a freely available intelligent system that clusters together similar contexts in natural language text. Thereafter it assigns identifying labels to these clusters based on their content. It is a purely unsupervised approach that is language independent, and uses no knowledge other than what is available in raw un-annotated corpora. In addition to clustering similar contexts, it can be used to identify synonyms and sets of related words. It has been applied to a diverse range of problems, including proper name disambiguation, word sense discrimination, email organization, and document clustering. SenseClusters is a complete system that supports feature selection from large corpora, several different context representation schemes, various clustering algorithms, the creation of descriptive and discriminating labels for the discovered clusters, and evaluation relative to gold standard data.

AAAI Conference 2004 Short Paper

Discriminating Among Word Meanings by Identifying Similar Contexts

  • Amruta Purandare
  • Ted Pedersen

Word sense discrimination is an unsupervised clustering problem, which seeks to discover which instances of a word/s are used in the same meaning. This is done strictly based on information found in raw corpora, without using any sense tagged text or other existing knowledge sources. Our particular focus is to systematically compare the efficacy of a range of lexical features, context representations, and clustering algorithms when applied to this problem.

AAAI Conference 2004 System Paper

SenseClusters — Finding Clusters that Represent Word Senses

  • Amruta Purandare
  • Ted Pedersen

SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach. It uses no knowledge other than what is available in a raw unstructured corpus, and clusters instances of a given target word based only on their mutual contextual similarities. It is a complete system that provides support for feature selection from large corpora, several different context representation schemes, various clustering algorithms, and evaluation of the discovered clusters.

AAAI Conference 2004 System Paper

WordNet::Similarity — Measuring the Relatedness of Concepts

  • Ted Pedersen

WordNet: :Similarity is a freely available software package that makes it possible to measure the semantic similarity or relatedness between a pair of concepts (or word senses). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical database WordNet. These measures are implemented as Perl modules which take as input two concepts, and return a numeric value that represents the degree to which they are similar or related.

IJCAI Conference 2003 Conference Paper

Extended Gloss Overlaps as a Measure of Semantic Relatedness

  • Satanjeev Banerjee
  • Ted Pedersen

This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to which they are related according to a given concept hierarchy. We show that this new measure reasonably correlates to human judgments. We introduce a new method of word sense disambiguation based on extended gloss overlaps, and demonstrate that it fares well on the SENSEVAL-2 lexical sample data.

AAAI Conference 1998 Conference Paper

Knowledge Lean Word–Sense Disambiguation

  • Ted Pedersen

Wepresent a corpus-based approach to word-sense disambiguation that only requires information that can be automatically extracted from untagged text. Weuse unsupervised techniques to estimate the parameters of a modeldescribing the conditional distribution of the sense group given the knowncontextual features. Both the EMalgorithm and Gibbs Sampling are evaluated to determine which is most appropriate for our data. Wecompare their disambiguation accuracy in an experiment with thirteen different words and three feature sets. Gibbs Samplingresults in small but consistent improvementin disambiguation accuracy over the EMalgorithm.

AAAI Conference 1997 Conference Paper

A New Supervised Learning Algorithm for Word Sense Disambiguation

  • Ted Pedersen

The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best-fitting model at each level of model complexity. The Naive Mix utilizes this sequence of models to define a probabilistic model which is then used as a probabilistic classifier to perform word-sense disambiguation. The models in this sequence are restricted to the class of decomposable log-linear models. This class of models offers a number of computational advantages. Experiments disambiguating twelve different words show that a Naive Mix formulated with a forward sequential search and Akaike’s Information Criteria rivals established supervised learning algorithms such as decision trees (C4. 5), rule induction (CN2) and nearest-neighbor classification (PEBLS).

AAAI Conference 1996 Conference Paper

Significant Lexical Relationships

  • Ted Pedersen

Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and can be performed using freely available software. We apply this test to the study of lexical relationships and demonstrate that the results obtained using this test are both theoretically more reliable and different from the results obtained using previously applied tests.