Author name cluster

Mosha Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

AAAI Conference 2021 Conference Paper

Contrastive Triple Extraction with Generative Transformer

Hongbin Ye
Ningyu Zhang
Shumin Deng
Mosha Chen
Chuanqi Tan
Fei Huang
Huajun Chen

Triple extraction is an essential task in information extraction for natural language processing and knowledge graph construction. In this paper, we revisit the end-to-end triple extraction task for sequence generation. Since generative triple extraction may struggle to capture long-term dependencies and generate unfaithful triples, we introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object. Moreover, we introduce two mechanisms to further improve model performance (i. e. , batch-wise dynamic attentionmasking and triple-wise calibration). Experimental results on three datasets (i. e. , NYT, WebNLG, and MIE) show that our approach achieves better performance than that of baselines.

PDF Details

IJCAI Conference 2021 Conference Paper

Document-level Relation Extraction as Semantic Segmentation

Ningyu Zhang
Xiang Chen
Xin Xie
Shumin Deng
Chuanqi Tan
Mosha Chen
Fei Huang
Luo Si

Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Nested Named Entity Recognition with Partially-Observed TreeCRFs

Yao Fu
Chuanqi Tan
Mosha Chen
Songfang Huang
Fei Huang

Named entity recognition (NER) is a well-studied task in natural language processing. However, the widely-used sequence labeling framework is difficult to detect entities with nested structures. In this work, we view nested NER as constituency parsing with partially-observed trees and model it with partially-observed TreeCRFs. Specifically, we view all labeled entity spans as observed nodes in a constituency tree, and other spans as latent nodes. With the TreeCRF we achieve a uniform way to jointly model the observed and the latent nodes. To compute the probability of partial trees with partial marginalization, we propose a variant of the Inside algorithm, the MASKED INSIDE algorithm, that supports different inference operations for different nodes (evaluation for the observed, marginalization for the latent, and rejection for nodes incompatible with the observed) with efficient parallelized implementation, thus significantly speeding up training and inference. Experiments show that our approach achieves the state-of-the-art (SOTA) F1 scores on the ACE2004, ACE2005 dataset, and shows comparable performance to SOTA models on the GENIA dataset. We release the code at https: //github. com/FranxYao/Partially-Observed-TreeCRFs.

PDF Details

ICLR Conference 2021 Conference Paper

Probing BERT in Hyperbolic Spaces

Boli Chen
Yao Fu
Guangwei Xu
Pengjun Xie
Chuanqi Tan
Mosha Chen
Liping Jing

Recently, a variety of probing tasks are proposed to discover linguistic properties learned in contextualized word embeddings. Many of these works implicitly assume these embeddings lay in certain metric spaces, typically the Euclidean space. This work considers a family of geometrically special spaces, the hyperbolic spaces, that exhibit better inductive biases for hierarchical structures and may better reveal linguistic hierarchies encoded in contextualized representations. We introduce a $\textit{Poincaré probe}$, a structural probe projecting these embeddings into a Poincaré subspace with explicitly defined hierarchies. We focus on two probing objectives: (a) dependency trees where the hierarchy is defined as head-dependent structures; (b) lexical sentiments where the hierarchy is defined as the polarity of words (positivity and negativity). We argue that a key desideratum of a probe is its sensitivity to the existence of linguistic structures. We apply our probes on BERT, a typical contextualized embedding model. In a syntactic subspace, our probe better recovers tree structures than Euclidean probes, revealing the possibility that the geometry of BERT syntax may not necessarily be Euclidean. In a sentiment subspace, we reveal two possible meta-embeddings for positive and negative sentiments and show how lexically-controlled contextualization would change the geometric localization of embeddings. We demonstrate the findings with our Poincaré probe via extensive experiments and visualization. Our results can be reproduced at https://github.com/FranxYao/PoincareProbe

Details

AAAI Conference 2020 Conference Paper

Boundary Enhanced Neural Span Classification for Nested Named Entity Recognition

Chuanqi Tan
Wei Qiu
Mosha Chen
Rui Wang
Fei Huang

Named entity recognition (NER) is a well-studied task in natural language processing. However, the widely-used sequence labeling framework is usually difﬁcult to detect entities with nested structures. The span-based method that can easily detect nested entities in different subsequences is naturally suitable for the nested NER problem. However, previous span-based methods have two main issues. First, classifying all subsequences is computationally expensive and very inefﬁcient at inference. Second, the span-based methods mainly focus on learning span representations but lack of explicit boundary supervision. To tackle the above two issues, we propose a boundary enhanced neural span classiﬁcation model. In addition to classifying the span, we propose incorporating an additional boundary detection task to predict those words that are boundaries of entities. The two tasks are jointly trained under a multitask learning framework, which enhances the span representation with additional boundary supervision. In addition, the boundary detection model has the ability to generate high-quality candidate spans, which greatly reduces the time complexity during inference. Experiments show that our approach outperforms all existing methods and achieves 85. 3, 83. 9, and 78. 3 scores in terms of F1 on the ACE2004, ACE2005, and GENIA datasets, respectively.

PDF Details

NeurIPS Conference 2020 Conference Paper

Latent Template Induction with Gumbel-CRFs

Yao Fu
Chuanqi Tan
Bin Bi
Mosha Chen
Yansong Feng
Alexander Rush

Learning to control the structure of sentences is a challenging problem in text generation. Existing work either relies on simple deterministic approaches or RL-based hard structures. We explore the use of structured variational autoencoders to infer latent templates for sentence generation using a soft, continuous relaxation in order to utilize reparameterization for training. Specifically, we propose a Gumbel-CRF, a continuous relaxation of the CRF sampling algorithm using a relaxed Forward-Filtering Backward-Sampling (FFBS) approach. As a reparameterized gradient estimator, the Gumbel-CRF gives more stable gradients than score-function based estimators. As a structured inference network, we show that it learns interpretable templates during training, which allows us to control the decoder during testing. We demonstrate the effectiveness of our methods with experiments on data-to-text generation and unsupervised paraphrase generation.

PDF Details