Arrow Research search

Author name cluster

Peter Prettenhofer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

TIST Journal 2011 Journal Article

Cross-Lingual Adaptation Using Structural Correspondence Learning

  • Peter Prettenhofer
  • Benno Stein

Cross-lingual adaptation is a special case of domain adaptation and refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation in the context of text classification. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce a cross-lingual representation that enables the transfer of classification knowledge from the source to the target language. The main advantages of this method over existing methods are resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual word correspondences.

JMLR Journal 2011 Journal Article

Scikit-learn: Machine Learning in Python

  • Fabian Pedregosa
  • Gaël Varoquaux
  • Alexandre Gramfort
  • Vincent Michel
  • Bertrand Thirion
  • Olivier Grisel
  • Mathieu Blondel
  • Peter Prettenhofer

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2011. ( edit, beta )