Author name cluster

Bin Bi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

ICML Conference 2023 Conference Paper

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Haiyang Xu 0001
Qinghao Ye
Ming Yan 0008
Yaya Shi
Jiabo Ye
Yuanhong Xu
Chenliang Li 0003
Bin Bi

Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration while addressing the problem of modality entanglement. In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement. It is flexible to select different modules for different understanding and generation tasks across all modalities including text, image, and video. Empirical study shows that mPLUG-2 achieves state-of-the-art or competitive results on a broad range of over 30 downstream tasks, spanning multi-modal tasks of image-text and video-text understanding and generation, and uni-modal tasks of text-only, image-only, and video-only understanding. Notably, mPLUG-2 shows new state-of-the-art results of 48. 0 top-1 accuracy and 80. 3 CIDEr on the challenging MSRVTT video QA and video caption tasks with a far smaller model size and data scale. It also demonstrates strong zero-shot transferability on vision-language and video-language tasks. Code and models will be released in https: //github. com/X-PLUG/mPLUG-2.

Details

AAAI Conference 2021 Conference Paper

A Unified Pretraining Framework for Passage Ranking and Expansion

Ming Yan
Chenliang Li
Bin Bi
Wei Wang
Songfang Huang

Pretrained language models have recently advanced a wide range of natural language processing tasks. Nowadays, the application of pretrained language models to IR tasks has also achieved impressive results. Typical methods either directly apply a pretrained model to improve the re-ranking stage, or use it to conduct passage expansion and term weighting for first-stage retrieval. We observe that the passage ranking and passage expansion tasks share certain inherent relations, and can benefit from each other. Therefore, in this paper, we propose a general pretraining framework to enhance both tasks with Unified Encoder-Decoder networks (UED). The overall ranking framework consists of two parts in a cascade manner: (1) passage expansion with a pretraining-based query generation method; (2) re-ranking of passage candidates from a traditional retrieval method with a pretrained transformer encoder. Both the two parts are based on the same pretrained UED model, where we jointly train the passage ranking and query generation tasks for further improving the full ranking pipeline. An extensive set of experiments have been conducted on two large-scale passage retrieval datasets to demonstrate the state-of-the-art results of the proposed framework in both the first-stage retrieval and the final re-ranking. In addition, we successfully deploy the framework to our online production system, which can stably serve industrial applications with a request volume of up to 100 QPS in less than 300ms.

PDF Details

AAAI Conference 2020 Conference Paper

Generating Well-Formed Answers by Machine Reading with Stochastic Selector Networks

Bin Bi
Chen Wu
Ming Yan
Wei Wang
Jiangnan Xia
Chenliang Li

Question answering (QA) based on machine reading comprehension has been a recent surge in popularity, yet most work has focused on extractive methods. We instead address a more challenging QA problem of generating a well-formed answer by reading and summarizing the paragraph for a given question. For the generative QA task, we introduce a new neural architecture, LatentQA, in which a novel stochastic selector network composes a well-formed answer with words selected from the question, the paragraph and the global vocabulary, based on a sequence of discrete latent variables. Bayesian inference for the latent variables is performed to train the LatentQA model. The experiments on public datasets of natural answer generation conﬁrm the effectiveness of LatentQA in generating high-quality well-formed answers.

PDF Details

NeurIPS Conference 2020 Conference Paper

Latent Template Induction with Gumbel-CRFs

Yao Fu
Chuanqi Tan
Bin Bi
Mosha Chen
Yansong Feng
Alexander Rush

Learning to control the structure of sentences is a challenging problem in text generation. Existing work either relies on simple deterministic approaches or RL-based hard structures. We explore the use of structured variational autoencoders to infer latent templates for sentence generation using a soft, continuous relaxation in order to utilize reparameterization for training. Specifically, we propose a Gumbel-CRF, a continuous relaxation of the CRF sampling algorithm using a relaxed Forward-Filtering Backward-Sampling (FFBS) approach. As a reparameterized gradient estimator, the Gumbel-CRF gives more stable gradients than score-function based estimators. As a structured inference network, we show that it learns interpretable templates during training, which allows us to control the decoder during testing. We demonstrate the effectiveness of our methods with experiments on data-to-text generation and unsupervised paraphrase generation.

PDF Details

ICLR Conference 2020 Conference Paper

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang 0225
Bin Bi
Ming Yan 0008
Chen Wu 0006
Jiangnan Xia
Zuyi Bao
Liwei Peng
Luo Si

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman, we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models at the time of model submission), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

Details

AAAI Conference 2019 Conference Paper

A Deep Cascade Model for Multi-Document Reading Comprehension

Ming Yan
Jiangnan Xia
Chen Wu
Bin Bi
Zhongzhou Zhao
Ji Zhang
Luo Si
Rui Wang

A fundamental trade-off between effectiveness and efficiency needs to be balanced when designing an online question answering system. Effectiveness comes from sophisticated functions such as extractive machine reading comprehension (MRC), while efficiency is obtained from improvements in preliminary retrieval components such as candidate document selection and paragraph ranking. Given the complexity of the real-world multi-document MRC scenario, it is difficult to jointly optimize both in an end-to-end system. To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the documentlevel and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension. Specifically, irrelevant documents and paragraphs are first filtered out with simple functions for efficiency consideration. Then we jointly train three modules on the remaining texts for better tracking the answer: the document extraction, the paragraph extraction and the answer extraction. Experiment results show that the proposed method outperforms the previous state-of-the-art methods on two large-scale multidocument benchmark datasets, i. e. , TriviaQA and DuReader. In addition, our online system can stably serve typical scenarios with millions of daily requests in less than 50ms.

PDF Details