Author name cluster

Shaobo Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

AAAI Conference 2026 Conference Paper

ImageBindDC: Compressing Multi-modal Data with ImageBind-based Condensation

Yue Min
Shaobo Wang
Jiaze Li
Tianle Niu
Junxin Fan
Yongliang Miao
Lijin Yang
Linfeng Zhang

Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal dependencies is crucial. To address this, we introduce ImageBindDC, a novel data condensation framework operating within the unified feature space of ImageBind. Our approach moves beyond conventional distribution-matching by employing a powerful Characteristic Function (CF) loss, which operates in the Fourier domain to facilitate a more precise statistical alignment via exact infinite moment matching. We design our objective to enforce three critical levels of distributional consistency: (i) uni-modal alignment, which matches the statistical properties of synthetic and real data within each modality; (ii) cross-modal alignment, which preserves pairwise semantics by matching the distributions of hybrid real-synthetic data pairs; and (iii) joint-modal alignment, which captures the complete multivariate data structure by aligning the joint distribution of real data pairs with their synthetic counterparts. Extensive experiments highlight the effectiveness of ImageBindDC: on the NYU-v2 dataset, a model trained on just 5 condensed datapoints per class achieves lossless performance comparable to one trained on the full dataset, achieving a new state-of-the-art with an 8.2% absolute improvement over the previous best method and more than 4× less condensation time.

PDF Details DOI

TIST Journal 2026 Journal Article

TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs

Shuyi Xie
Wenlin Yao
Yong Dai
Shaobo Wang
Zishan Xu
Fan Lin
Donglin Zhou
Lifeng Jin

Large language models (LLMs) have shown impressive capabilities across various natural language tasks. However, evaluating their alignment with human preferences remains a challenge. To this end, we propose a comprehensive human evaluation framework to assess LLMs’ proficiency in following instructions on diverse real-world tasks. We construct a hierarchical task tree encompassing seven major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multi-turn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner. We also design detailed evaluation standards and processes to facilitate consistent, unbiased judgments from human evaluators. A test set of over 3,000 instances is released, spanning different difficulty levels and knowledge domains. Our work provides a standardized methodology to evaluate human alignment in LLMs for both English and Chinese. We also analyze the feasibility of automating parts of evaluation with a strong LLM (GPT-4). Our framework supports a thorough assessment of LLMs as they are integrated into real-world applications. We have made publicly available the task tree, TencentLLMEval dataset, and evaluation methodology which have been demonstrated as effective in assessing the performance of Tencent Hunyuan LLMs. By doing so, we aim to facilitate the benchmarking of advances in the development of safe and human-aligned LLMs.

Details DOI

AAAI Conference 2026 Conference Paper

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

Furui Xu
Shaobo Wang
Jiajun Zhang
Chenghao Sun
Haixiang Tang
Linfeng Zhang

The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable performance. Previous approaches typically establish scoring metrics based on specific criteria to identify representative samples. However, these methods predominantly rely on sample scores obtained from the model's performance during the training (i.e., fitting) phase. As scoring models achieve near-optimal performance on training data, such fitting-centric approaches induce a dense distribution of sample scores within a narrow numerical range. This concentration reduces the distinction between samples and hinders effective selection. To address this challenge, we conduct dataset pruning from the perspective of generalization, i.e., scoring samples based on models not exposed to them during training. We propose a plug-and-play framework, UNSEEN, which can be integrated into existing dataset pruning methods. Additionally, conventional score-based methods are single-step and rely on models trained solely on the complete dataset, providing limited perspective on the importance of samples. To address this limitation, we scale UNSEEN to multi-step scenarios and propose an incremental selection technique through scoring models trained on varying coresets, and optimize the quality of the coreset dynamically. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art (SOTA) methods on CIFAR-10, CIFAR-100, and ImageNet-1K. Notably, on ImageNet-1K, UNSEEN achieves lossless performance while reducing training data by 30%.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Zichen Wen
Shaobo Wang
Yufa Zhou
Junyuan Zhang
Qintong Zhang
Yifeng Gao
Zhaorun Chen
Bin Wang

Visual tokens consume substantial computational resources in multi-modal large models (MLLMs), significantly compromising their efficiency. Recent works have attempted to improve efficiency by compressing visual tokens during training, either through modifications to model components or by introducing additional parameters. However, they often overlook the increased learning difficulty caused by such compression, as the model’s parameter space struggles to quickly adapt to the substantial perturbations in the feature space induced by token compression. In this work, we propose to develop Efficient MLLMs via Progressive Consistency Distillation (EPIC), a progressive learning framework. Specifically, by decomposing the feature space perturbations introduced by token compression along the token-wise and layer-wise dimensions, we introduce token consistency distillation and layer consistency distillation, respectively, aiming to reduce the training difficulty by leveraging guidance from a teacher model and following a progressive learning trajectory. Extensive experiments demonstrate the superior effectiveness, robustness, and generalization capabilities of our proposed framework.

PDF Details

JBHI Journal 2023 Journal Article

An Interpretable Data-Driven Medical Knowledge Discovery Pipeline Based on Artificial Intelligence

Shaobo Wang
Xinhui Du
Guangliang Liu
Hang Xing
Zengtao Jiao
Jun Yan
Youjun Liu
Haichen Lv

Difficulty in knowledge validation is a significant hindrance to knowledge discovery via data mining, especially automatic validation without artificial participation. In the field of medical research, medical knowledge discovery from electronic medical records is a common medical data mining method, but it is difficult to validate the discovered medical knowledge without the participation of medical experts. In this article, we propose a data-driven medical knowledge discovery closed-loop pipeline based on interpretable machine learning and deep learning; the components of the pipeline include Data Generator, Medical Knowledge Mining, Medical Knowledge Evaluation, and Medical Knowledge Application. In addition to completing the discovery of medical knowledge, the pipeline can also automatically validate the knowledge. We apply our pipeline's discovered medical knowledge to a traditional prognostic predictive model of heart failure in a real-world study, demonstrating that the incorporation of medical knowledge can effectively improve the performance of the traditional model. We also construct a scale model based on the discovered medical knowledge and demonstrate that it achieves good performance. To guarantee its medical effectiveness, every process of our pipeline involves the participation of medical experts.

Details DOI

NeurIPS Conference 2021 Conference Paper

Visualizing the Emergence of Intermediate Visual Patterns in DNNs

Mingjie Li
Shaobo Wang
Quanshi Zhang

This paper proposes a method to visualize the discrimination power of intermediate-layer visual patterns encoded by a DNN. Specifically, we visualize (1) how the DNN gradually learns regional visual patterns in each intermediate layer during the training process, and (2) the effects of the DNN using non-discriminative patterns in low layers to construct disciminative patterns in middle/high layers through the forward propagation. Based on our visualization method, we can quantify knowledge points (i. e. the number of discriminative visual patterns) learned by the DNN to evaluate the representation capacity of the DNN. Furthermore, this method also provides new insights into signal-processing behaviors of existing deep-learning techniques, such as adversarial attacks and knowledge distillation.

PDF Details