Author name cluster

Ming Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers

2 author rows

ICML Conference 2025 Conference Paper

Efficient Skill Discovery via Regret-Aware Optimization

He Zhang 0030
Ming Zhou
Shaopeng Zhai
Ying Sun 0006
Hui Xiong 0001

Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For the existing methods, they focus on improving the diversity via pure exploration, mutual information optimization and learning temporal representation. Despite they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional situations. In this work, we frame the skill discovery as a min-max game of skill generation and policy learning, proposing a regret-aware method on top of temporal representation learning that expands the discovered skill space along the direction of upgradable policy strength. The key insight behind the proposed method is that the skill discovery is adversarial to the policy learning, i. e. , skills with weak strength should be further explored while less exploration for the skills with converged strength. As an implementation, we score the degree of strength convergence with regret, and guide the skill discovery with a learnable skill generator. To avoid degeneration, the skill generation comes from an upgradable population of skill generators. We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines on both efficiency and diversity. Moreover, our method achieves 15% zero-shot improvement on high-dimensional environments, compared to existing methods.

Details

AAAI Conference 2025 Conference Paper

Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

Guangyuan Ma
Yongliang Ma
Xing Wu
Zhenpeng Su
Ming Zhou
Songlin Hu

Large Language Model-based Dense Retrieval (LLM-DR) optimizes over numerous heterogeneous fine-tuning collections from different domains. However, the discussion about its training data distribution is still minimal. Previous studies rely on empirically assigned dataset choices or sampling ratios, which inevitably lead to sub-optimal retrieval performances. In this paper, we propose a new task-level Distributionally Robust Optimization (tDRO) algorithm for LLM-DR fine-tuning, targeted at improving the universal domain generalization ability by end-to-end reweighting the data distribution of each task. The tDRO parameterizes the domain weights and updates them with scaled domain gradients. The optimized weights are then transferred to the LLM-DR fine-tuning to train more robust retrievers. Experiments show optimal improvements in large-scale retrieval benchmarks and reduce up to 30% dataset usage after applying our optimization algorithm with a series of different-sized LLM-DR models.

PDF Details DOI

AAAI Conference 2024 Conference Paper

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

Kun Yan
Lei Ji
Chenfei Wu
Jian Liang
Ming Zhou
Nan Duan
Shuai Ma

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct application of these methods to panorama synthesis yields distorted content. In this study, we unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling. Our pioneering approach empowers users with semantic control, harnessing both image and text inputs, while concurrently streamlining the generation of high-resolution panoramas using parallel decoding. We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics. Our research elevates the controllability, efficiency, and fidelity of panorama synthesis to new levels.

PDF Details DOI

JMLR Journal 2023 Journal Article

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

Ming Zhou
Ziyu Wan
Hanjing Wang
Muning Wen
Runzhe Wu
Ying Wen
Yaodong Yang
Yong Yu

Population-based multi-agent reinforcement learning (PB-MARL) encompasses a range of methods that merge dynamic population selection with multi-agent reinforcement learning algorithms (MARL). While PB-MARL has demonstrated notable achievements in complex multi-agent tasks, its sequential execution is plagued by low computational efficiency due to the diversity in computing patterns and policy combinations. We propose a solution involving a stateless central task dispatcher and stateful workers to handle PB-MARL's subroutines, thereby capitalizing on parallelism across various components for efficient problem-solving. In line with this approach, we introduce MALib, a parallel framework that incorporates a task control model, independent data servers, and an abstraction of MARL training paradigms. The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib) [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )