Author name cluster

Kaile Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

NeurIPS Conference 2025 Conference Paper

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

Boyuan Chen
Donghai Hong
Jiaming Ji
Jiacheng Zheng
Bowen Dong
Jiayi Zhou
Kaile Wang
Juntao Dai

As multimodal large models (MLLMs) continue to advance across challenging tasks, a key question emerges: \textbf{\textit{What essential capabilities are still missing? }}A critical aspect of human learning is continuous interaction with the environment -- not limited to language, but also involving multimodal understanding and generation. To move closer to human-level intelligence, models must similarly support \textbf{multi-turn}, \textbf{multimodal interaction}. In particular, they should comprehend interleaved multimodal contexts and respond coherently in ongoing exchanges. In this work, we present \textbf{an initial exploration} through the \textsc{InterMT} -- \textbf{the first preference dataset for \textit{multi-turn} multimodal interaction}, grounded in real human feedback. In this exploration, we particularly emphasize the importance of human oversight, introducing expert annotations to guide the process, motivated by the fact that current MLLMs lack such complex interactive capabilities. \textsc{InterMT} captures human preferences at both global and local levels into nine sub-dimensions, consists of 15. 6k prompts, 52. 6k multi-turn dialogue instances, and 32. 4k human-labeled preference pairs. To compensate for the lack of capability for multi-modal understanding and generation, we introduce an agentic workflow that leverages tool-augmented MLLMs to construct multi-turn QA instances. To further this goal, we introduce \textsc{InterMT-Bench} to assess the ability ofMLLMs in assisting judges with multi-turn, multimodal tasks. We demonstrate the utility of \textsc{InterMT} through applications such as judge moderation and further reveal the \textit{multi-turn scaling law} of judge model. We hope the open-source of our data can help facilitate further research on aligning current MLLMs to the next step.

PDF Details

NeurIPS Conference 2025 Conference Paper

Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback

Jiaming Ji
Xinyu Chen
Rui Pan
Han Zhu
Jiahao Li
Donghai Hong
Boyuan Chen
Jiayi Zhou

Multimodal large language models (MLLMs) are essential for building general-purpose AI assistants; however, they pose increasing safety risks. How can we ensure safety alignment of MLLMs to prevent undesired behaviors? Going further, it is critical to explore how to fine-tune MLLMs to preserve capabilities while meeting safety constraints. Fundamentally, this challenge can be formulated as a min-max optimization problem. However, existing datasets have not yet disentangled single preference signals into explicit safety constraints, hindering systematic investigation in this direction. Moreover, it remains an open question whether such constraints can be effectively incorporated into the optimization process for multi-modal models. In this work, we present the first exploration of the Safe RLHF-V -- the first multimodal safety alignment framework. The framework consists of: (I) BeaverTails-V, the first open-source dataset featuring dual preference annotations for helpfulness and safety, supplemented with multi-level safety labels (minor, moderate, severe); (II) Beaver-Guard-V, a multi-level guardrail system to proactively defend against unsafe queries and adversarial attacks. Applying the guard model over five rounds of filtering and regeneration significantly enhances the precursor model’s overall safety by an average of 40. 9%. (II) Based on dual preference, we initiate the first exploration of multi-modal safety alignment within a constrained optimization. Experimental results demonstrate that Safe RLHF effectively improves both model helpfulness and safety. Specifically, Safe RLHF-V enhances model safety by 34. 2% and helpfulness by 34. 3%.

PDF Details

AAAI Conference 2025 Conference Paper

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

Hantao Lou
Jiaming Ji
Kaile Wang
Yaodong Yang

The rapid advancement of large language models (LLMs) has led to significant improvements in their capabilities, but also to increased concerns about their alignment with human values and intentions. Current alignment strategies, including adaptive training and inference-time methods, have demonstrated potential in this area. However, these approaches still struggle to balance deployment complexity and capability across various tasks and difficulties. In this work, we introduce the Streaming Distribution Induce Aligner (Stream Aligner), a novel alignment paradigm that combines efficiency with enhanced performance in various tasks throughout the generation process. Stream Aligner achieves dynamic sentence-level correction by using a small model to learn the preferences of the suffix sentence, iteratively correcting the suffix sentence output by the upstream model, and then using the corrected sentence to replace the suffix sentence in subsequent generations. Compared to Aligner, our experiments demonstrate that Stream Aligner reduces reliance on the capabilities of additional models, enhances the reasoning abilities of LLMs, and decreases latency during user interaction. Specifically, Stream Aligner-2B model has achieved an improvement of 76.1% in helpfulness, 36.0% in harmlessness on the tested Llama2-70B-chat model, and Stream Aligner-8B has achieved an improvement of 3.5% on the math ability of the tested Llama3-70B-Instruct model.

PDF Details DOI