Author name cluster

Mengli Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2024 Conference Paper

MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling

Jiaqi Xu
Bo Liu
Yunkuo Chen
Mengli Cheng
Xing Shi

Video-and-language understanding has a variety of applications in the industry, such as video question answering, text-video retrieval, and multi-label classification. Existing video-and-language understanding methods generally adopt heavy multi-modal encoders and feature fusion modules, which consume high computational costs. Specially, they have difficulty dealing with dense video frames or long text prevalent in industrial applications. This paper proposes MuLTI, a highly accurate and efficient video-and-language understanding model that achieves efficient and effective feature fusion and rapid adaptation to downstream tasks. Specifically, we design a Text-Guided MultiWay-Sampler based on adapt-pooling residual mapping and self-attention modules to sample long sequences and fuse multi-modal features, which reduces the computational costs and addresses performance degradation caused by previous samplers. Therefore, MuLTI can handle longer sequences with limited computational costs. Then, to further enhance the model's performance and fill in the lack of pretraining tasks in the video question answering, we propose a new pretraining task named Multiple Choice Modeling. This task bridges the gap between pretraining and downstream tasks and improves the model's ability to align video and text features. Benefiting from the efficient feature fusion module and the new pretraining task, MuLTI achieves state-of-the-art performance on multiple datasets. Implementation and pretrained models will be released.

PDF Details DOI

AAAI Conference 2023 System Paper

EasyRec: An Easy-to-Use, Extendable and Efficient Framework for Building Industrial Recommendation Systems

Mengli Cheng
Yue Gao
Guoqiang Liu
HongSheng Jin

We present EasyRec, an easy-to-use, extendable and efficient recommendation framework for building industrial recommendation systems. Our EasyRec framework is superior in the following aspects:first, EasyRec adopts a modular and pluggable design pattern to reduce the efforts to build custom models; second, EasyRec implements hyper-parameter optimization and feature selection algorithms to improve model performance automatically; third, EasyRec applies online learning to adapt to the ever-changing data distribution. The code is released: https://github.com/alibaba/EasyRec.

PDF Details DOI

AAAI Conference 2021 System Paper

EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition

Chengyu Wang
Mengli Cheng
Xu Hu
Jun Huang

We present EasyASR, a distributed machine learning platform for training and serving large-scale Automatic Speech Recognition (ASR) models, as well as collecting and processing audio data at scale. Our platform is built upon the Machine Learning Platform for AI of Alibaba Cloud. Its main functionality is to support efficient learning and inference for end-to-end ASR models on distributed GPU clusters. It allows users to learn ASR models with either pre-defined or user-customized network architectures via simple user interface. On EasyASR, we have produced state-of-the-art results over several public datasets for Mandarin speech recognition.

PDF Details

IJCAI Conference 2018 Conference Paper

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Qiangpeng Yang
Mengli Cheng
Wenmeng Zhou
Yan Chen
Minghui Qiu
Wei Lin

Incidental scene text detection, especially for multi-oriented text regions, is one of the most challenging tasks in many computer vision applications. Different from the common object detection task, scene text often suffers from a large variance of aspect ratio, scale, and orientation. To solve this problem, we propose a novel end-to-end scene text detector IncepText from an instance-aware segmentation perspective. We design a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection. Extensive experiments on ICDAR2015, RCTW-17, and MSRA-TD500 datasets demonstrate our method's superiority in terms of both effectiveness and efficiency. Our proposed method achieves 1st place result on ICDAR2015 challenge and the state-of-the-art performance on other datasets. Moreover, we have released our implementation as an OCR product which is available for public access.

PDF Details