Author name cluster

Xinya Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2024 Conference Paper

AltDiffusion: A Multilingual Text-to-Image Diffusion Model

Fulong Ye
Guang Liu
Xinya Wu
Ledell Wu

Large Text-to-Image(T2I) diffusion models have shown a remarkable capability to produce photorealistic and diverse images based on text inputs. However, existing works only support limited language input, e.g., English, Chinese, and Japanese, leaving users beyond these languages underserved and blocking the global expansion of T2I models. Therefore, this paper presents AltDiffusion, a novel multilingual T2I diffusion model that supports eighteen different languages. Specifically, we first train a multilingual text encoder based on the knowledge distillation. Then we plug it into a pretrained English-only diffusion model and train the model with a two-stage schema to enhance the multilingual capability, including concept alignment and quality improvement stage on a large-scale multilingual dataset. Furthermore, we introduce a new benchmark, which includes Multilingual-General-18(MG-18) and Multilingual-Cultural-18(MC-18) datasets, to evaluate the capabilities of T2I diffusion models for generating high-quality images and capturing culture-specific concepts in different languages. Experimental results on both MG-18 and MC-18 demonstrate that AltDiffusion outperforms current state-of-the-art T2I models, e.g., Stable Diffusion in multilingual understanding, especially with respect to culture-specific concepts, while still having comparable capability for generating high-quality images. All source code and checkpoints could be found in https://github.com/superhero-7/AltDiffuson.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Zheqi He
Xinya Wu
Pengfei Zhou
Richeng Xuan
Guang Liu
Xi Yang
Qiannan Zhu
Hua Huang

Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are predominantly available in English, which imposes limitations on the comprehensiveness of the evaluation. To this end, we introduce CMMU, a novel benchmark for multi-modal and multi-type question understanding and reasoning in Chinese. CMMU consists of 3, 603 questions in 7 subjects, covering knowledge from primary to high school. The questions can be categorized into 3 types: multiple-choice, multiple-response, and fill-in-the-blank, bringing greater challenges to MLLMs. In addition, we propose an evaluation strategy called Positional Error Variance for assessing multiple-choice questions. The strategy aims to perform a quantitative analysis of position bias. We evaluate seven open-source MLLMs along with GPT4-V, Gemini-Pro, and Qwen-VL-Plus. The results demonstrate that CMMU poses a significant challenge to the recent MLLMs. The data and code are available at https: //github. com/FlagOpen/CMMU.

PDF Details DOI

TIST Journal 2022 Journal Article

CAFE and SOUP: Toward Adaptive VDI Workload Prediction

Yao Zhang
Wenping Fan
Qichen Hao
Xinya Wu
Min-Ling Zhang

For Virtual Desktop Infrastructure (VDI) system, effective resource management is rather important where turning off spare virtual machines would help save running cost while maintaining sufficient virtual machines is essential to secure satisfactory user experience. Current VDI resource management strategy works in a passive manner by either reactively driving available capacity based on user demands or following manually configured schedules, which may lead to unnecessary running costs or unsatisfactory user experience. In this article, we propose a first attempt toward proactive VDI resource management, where two adaptive learning approaches for VDI workload prediction are proposed by learning from multi-grained historical features. For non-persistent desktop pool, based on the aggregation session count of pool-sharing users, the CAFE approach induces a pool-level workload predictive model by utilizing coarse-to-fine historical features extracted from aggregation workload data. For persistent desktop pool, based on the session connection status of individual users within the same pool, the SOUP approach induces user-level workload predictive model by incorporating encoded multi-grained features extracted from the logon behavior of individual users into an aggregation pool-level model. Extensive experiments on datasets of real VDI customers and electricity load evidently verify the effectiveness of the proposed adaptive approaches for VDI workload prediction as well as other workload prediction tasks.

Details DOI

IJCAI Conference 2021 Conference Paper

BAMBOO: A Multi-instance Multi-label Approach Towards VDI User Logon Behavior Modeling

Wenping Fan
Yao Zhang
Qichen Hao
Xinya Wu
Min-Ling Zhang

Different to traditional on-premise VDI, the virtual desktops in DaaS (Desktop as a Service) are hosted in public cloud where virtual machines are charged based on usage. Accordingly, an adaptive power management system which can turn off spare virtual machines without sacrificing end user experience is of significant customer value as it can greatly help reduce the running cost. Generally, logon behavior modeling for VDI users serves as the key enabling-technique to fulfill intelligent power management. Prior attempts work by modeling logon behavior in a user-dependent manner with tailored single-instance feature representation, where the strong relationships among pool-sharing VDI users are ignored in the modeling framework. In this paper, a novel formulation towards VDI user logon behavior modeling is proposed by employing the multi-instance multi-label (MIML) techniques. Specifically, each user is grouped with supporting users whose behaviors are jointly modeled in the feature space with multi-instance representation as well as in the output space with multi-label prediction. The resulting MIML formulation is optimized by adapting the popular MIML boosting procedure via balanced error-rate minimization. Experimental studies on real VDI customers' data clearly validate the effectiveness of the proposed MIML-based approach against state-of-the-art VDI user logon behavior modeling techniques.

PDF Details DOI