Author name cluster

Haoxin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

2 author rows

ICLR Conference 2025 Conference Paper

OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees

Kaiyan Zhang
Jiayuan Zhang 0001
Haoxin Li
Xuekai Zhu
Ermo Hua
Xingtai Lv
Ning Ding 0002
Biqing Qi

Scaling inference-time computation is increasingly seen as the next frontier in scaling laws for large language models. Previous work in mathematics and coding has demonstrated the remarkable potential for inference-time scaling. During such scaling, fine-grained supervision through process-based reward models (PRMs) is essential for enhancement. However, exploration of inference-time scaling and PRMs in open-domain problems remains limited, where lacking exact answers and obtaining process supervision prove challenging. In this paper, we explore the construction of PRMs for open-domain tasks, specifically for instruction-following tasks. Utilizing existing outcome-based reward models (ORMs), we develop sentence-level preference trees based on the prefix similarity of parallel sampled candidates from datasets like UltraFeedback. This setup allows us to derive weak supervision for processes via back-propagation from outcome-level rewards. Subsequently, we integrate ORMs and PRMs under the same pairwise ranking objectives, resulting in our newly developed reward models, named OpenPRM. This approach significantly enhances the scalability of process-level supervision in open domains at minimal cost. We assess the performance of OpenPRM across various reward benchmarks, demonstrating its competitive edge over traditional ORMs in open domains and PRMs in specialized domains. Additionally, we investigate the scalability of inference-time computation for open-domain instructions. Our results highlight the limitations of ORMs’ scalability, while OpenPRM shows superior performance in scaled settings. Despite these advances, achieving automatic fine-grained supervision for open-domain inference-time scaling remains a substantial challenge. We hope these findings will spur further development of process supervision reward models in open-domain scenarios.

Details

NeurIPS Conference 2024 Conference Paper

UltraMedical: Building Specialized Generalists in Biomedicine

Kaiyan Zhang
Sihang Zeng
Ermo Hua
Ning Ding
Zhang-Ren Chen
Zhiyuan Ma
Haoxin Li
Ganqu Cui

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas. Recent advanced proprietary models such as GPT-4 and Gemini have achieved significant advancements in biomedicine, which have also raised privacy and security challenges. The construction of specialized generalists hinges largely on high-quality datasets, enhanced by techniques like supervised fine-tuning and reinforcement learning from human or AI feedback, and direct preference optimization. However, these leading technologies (e. g. , preference learning) are still significantly limited in the open source community due to the scarcity of specialized data. In this paper, we present the UltraMedical collections, which consist of high-quality manual and synthetic datasets in the biomedicine domain, featuring preference annotations across multiple advanced LLMs. By utilizing these datasets, we fine-tune a suite of specialized medical models based on Llama-3 series, demonstrating breathtaking capabilities across various medical benchmarks. Moreover, we develop powerful reward models skilled in biomedical and general reward benchmark, enhancing further online preference learning within the biomedical LLM community.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Action Knowledge Transfer for Action Prediction with Partial Videos

Yijun Cai
Haoxin Li
Jian-Fang Hu
Wei-Shi Zheng

Predicting action class from partially observed videos, which is known as action prediction, is an important task in computer vision field with many applications. The challenge for action prediction mainly lies in the lack of discriminative action information for the partially observed videos. To tackle this challenge, in this work, we propose to transfer action knowledge learned from fully observed videos for improving the prediction of partially observed videos. Specifically, we develop a two-stage learning framework for action knowledge transfer. At the first stage, we learn feature embeddings and discriminative action classifier from full videos. The knowledge in the learned embeddings and classifier is then transferred to the partial videos at the second stage. Our experiments on the UCF-101 and HMDB-51 datasets show that the proposed action knowledge transfer method can significantly improve the performance of action prediction, especially for the actions with small observation ratios (e.g., 10%). We also experimentally illustrate that our method outperforms all the state-of-the-art action prediction systems.

PDF Details