Author name cluster

Baosheng Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

AAAI Conference 2026 Conference Paper

Cross-Sample Augmented Test-Time Adaptation for Personalized Intraoperative Hypotension Prediction

Kanxue Li
Yibing Zhan
Hua Jin
Chongchong Qi
Xu Lin
Baosheng Yu

Intraoperative hypotension (IOH) poses significant surgical risks, but accurate prediction remains challenging due to patient-specific variability. While test-time adaptation (TTA) offers a promising approach for personalized prediction, the rarity of IOH events often leads to unreliable test-time training. To address this, we propose CSA-TTA, a novel cross-sample augmented test-time adaptation framework that enhances training by incorporating hypotension events from other individuals. Specifically, we first construct a cross-sample bank by segmenting historical data into hypotensive and non-hypotensive samples. Then, we introduce a coarse-to-fine retrieval strategy for building test-time training data: we initially apply K-Shape clustering to identify representative cluster centers and subsequently retrieve the top-K semantically similar samples based on the current patient signal. Additionally, we integrate both self-supervised masked reconstruction and retrospective sequence forecasting signals during training to enhance model adaptability to rapid and subtle intraoperative dynamics. We evaluate the proposed CSA-TTA on both the VitalDB dataset and a real-world in-hospital dataset by integrating it with state-of-the-art time series forecasting models, including TimesFM and UniTS. CSA-TTA consistently enhances performance across settings—for instance, on VitalDB, it improves Recall and F1 scores by +1.33% and +1.13%, respectively, under fine-tuning, and by +7.46% and +5.07% in zero-shot scenarios—demonstrating strong robustness and generalization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Remodeling Semantic Relationships in Vision-Language Fine-Tuning

Xiangyang Wu
Liu Liu
Baosheng Yu
Jiayan Qiu
Zhenwei Shi

Vision-language fine-tuning has emerged as an efficient paradigm for constructing multimodal foundation models. While textual context often highlights semantic relationships within an image, existing fine-tuning methods typically overlook this information when aligning vision and language, thus leading to suboptimal performance. Toward solving this problem, we propose a method that can improve multimodal alignment and fusion based on both semantics and relationships.Specifically, we first extract multilevel semantic features from different vision encoder to capture more visual cues of the relationships. Then, we learn to project the vision features to group related semantics, among which are more likely to have relationships. Finally, we fuse the visual features with the textual by using inheritable cross-attention, where we globally remove the redundant visual relationships by discarding visual-language feature pairs with low correlation. We evaluate our proposed method on eight foundation models and two downstream tasks, visual question answering and image captioning, and show that it outperforms all existing methods.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation

Yanwei Ren
Haotian Zhang
Fuxiang Wu
Jiayan Qiu
Jiaxing Huang
Baosheng Yu
Liu Liu

Enhancing large language models by simply scaling up datasets has begun to yield diminishing returns, shifting the spotlight to data quality. Monte Carlo Tree Search (MCTS) has emerged as a powerful technique for generating high-quality chain-of-thought data, yet conventional approaches typically retain only the top-scoring trajectory from the search tree, discarding sibling nodes that often contain valuable partial insights, recurrent error patterns, and alternative reasoning strategies. This unconditional rejection of non-optimal reasoning branches may waste vast amounts of informative data in the whole search tree. We propose SIGMA (Sibling Guided Monte Carlo Augmentation), a novel framework that reintegrates these discarded sibling nodes to refine LLM reasoning. SIGMA forges semantic links among sibling nodes along each search path and applies a two-stage refinement: a critique model identifies overlooked strengths and weaknesses across the sibling set, and a revision model conducts text-based backpropagation to refine the top-scoring trajectory in light of this comparative feedback. By recovering and amplifying the underutilized but valuable signals from non-optimal reasoning branches, SIGMA substantially improves reasoning trajectories. On the challenging MATH benchmark, our SIGMA-tuned 7B model achieves 54. 92\% accuracy using only 30K samples, outperforming state-of-the-art models trained on 590K samples. This result highlights that our sibling-guided optimization not only significantly reduces data usage but also significantly boosts LLM reasoning.

PDF Details

IJCAI Conference 2024 Conference Paper

MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models

Kanxue Li
Baosheng Yu
Qi Zheng
Yibing Zhan
Yuhui Zhang
Tianle Zhang
Yijun Yang
Yue Chen

Foundation models have demonstrated significant emergent abilities, holding great promise for enhancing embodied agents' reasoning and planning capacities. However, the absence of a comprehensive benchmark for evaluating embodied agents with multimodal observations in complex environments remains a notable gap. In this paper, we present MuEP, a comprehensive Multimodal benchmark for Embodied Planning. MuEP facilitates the evaluation of multimodal and multi-turn interactions of embodied agents in complex scenes, incorporating fine-grained evaluation metrics that provide insights into the performance of embodied agents throughout each task. Furthermore, we evaluate embodied agents with recent state-of-the-art foundation models, including large language models (LLMs) and large multimodal models (LMMs), on the proposed benchmark. Experimental results show that foundation models based on textual representations of environments usually outperform their visual counterparts, suggesting a gap in embodied planning abilities with multimodal observations. We also find that control language generation is an indispensable ability beyond common-sense knowledge for accurate embodied task completion. We hope the proposed MuEP benchmark can contribute to the advancement of embodied AI with foundation models.

PDF Details DOI

TMLR Journal 2022 Journal Article

GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Haibo Qiu
Baosheng Yu
Dacheng Tao

Point cloud semantic segmentation from projected views, such as range-view (RV) and bird's-eye-view (BEV), has been intensively investigated. Different views capture different information of point clouds and thus are complementary to each other. However, recent projection-based methods for point cloud semantic segmentation usually utilize a vanilla late fusion strategy for the predictions of different views, failing to explore the complementary information from a geometric perspective during the representation learning. In this paper, we introduce a geometric flow network (GFNet) to explore the geometric correspondence between different views in an align-before-fuse manner. Specifically, we devise a novel geometric flow module (GFM) to bidirectionally align and propagate the complementary information across different views according to geometric relationships under the end-to-end learning scheme. We perform extensive experiments on two widely used benchmark datasets, SemanticKITTI and nuScenes, to demonstrate the effectiveness of our GFNet for project-based point cloud semantic segmentation. Concretely, GFNet not only significantly boosts the performance of each individual view but also achieves state-of-the-art results over all existing projection-based models. Code is available at \url{https://github.com/haibo-qiu/GFNet}.

PDF Details

AAAI Conference 2022 Conference Paper

Resistance Training Using Prior Bias: Toward Unbiased Scene Graph Generation

Chao Chen
Yibing Zhan
Baosheng Yu
Liu Liu
Yong Luo
Bo Du

Scene Graph Generation (SGG) aims to build a structured representation of a scene using objects and pairwise relationships, which benefits downstream tasks. However, current SGG methods usually suffer from sub-optimal scene graph generation because of the long-tailed distribution of training data. To address this problem, we propose Resistance Training using Prior Bias (RTPB) for the scene graph generation. Specifically, RTPB uses a distributed-based prior bias to improve models’ detecting ability on less frequent relationships during training, thus improving the model generalizability on tail categories. In addition, to further explore the contextual information of objects and relationships, we design a contextual encoding backbone network, termed as Dual Transformer (DTrans). We perform extensive experiments on a very popular benchmark, VG150, to demonstrate the effectiveness of our method for the unbiased scene graph generation. In specific, our RTPB achieves an improvement of over 10% under the mean recall when applied to current SGG methods. Furthermore, DTrans with RTPB outperforms nearly all stateof-the-art methods with a large margin. Code is available at https: //github. com/ChCh1999/RTPB

PDF Details

NeurIPS Conference 2021 Conference Paper

Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels

Sheng Wan
Yibing Zhan
Liu Liu
Baosheng Yu
Shirui Pan
Chen Gong

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of semi-supervised node classification. However, most existing GNN models require sufficient labeled data for effective network training. Their performance can be seriously degraded when labels are extremely limited. To address this issue, we propose a new framework termed Contrastive Graph Poisson Networks (CGPN) for node classification under extremely limited labeled data. Specifically, our CGPN derives from variational inference; integrates a newly designed Graph Poisson Network (GPN) to effectively propagate the limited labels to the entire graph and a normal GNN, such as Graph Attention Network, that flexibly guides the propagation of GPN; applies a contrastive objective to further exploit the supervision information from the learning process of GPN and GNN models. Essentially, our CGPN can enhance the learning performance of GNNs under extremely limited labels by contrastively propagating the limited labels to the entire graph. We conducted extensive experiments on different types of datasets to demonstrate the superiority of CGPN.

PDF Details

AAAI Conference 2020 Conference Paper

Unsupervised Domain Adaptation on Reading Comprehension

Yu Cao
Meng Fang
Baosheng Yu
Joey Tianyi Zhou

Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate the problem, we investigate unsupervised domain adaptation on RC, wherein a model is trained on the labeled source domain and to be applied to the target domain with only unlabeled samples. We ﬁrst show that even with the powerful BERT contextual representation, a model can not generalize well from one domain to another. To solve this, we provide a novel conditional adversarial self-training method (CASe). Speciﬁcally, our approach leverages a BERT model ﬁne-tuned on the source dataset along with the conﬁdence ﬁltering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable performance to supervised models on multiple large-scale benchmark datasets.

PDF Details

AAAI Conference 2016 Conference Paper

Linear Submodular Bandits with a Knapsack Constraint

Baosheng Yu
Meng Fang
Dacheng Tao

Linear submodular bandits has been proven to be effective in solving the diversiﬁcation and feature-based exploration problems in retrieval systems. Concurrently, many web-based applications, such as news article recommendation and online ad placement, can be modeled as budget-limited problems. However, the diversiﬁcation problem under a budget constraint has not been considered. In this paper, we ﬁrst introduce the budget constraint to linear submodular bandits as a new problem called the linear submodular bandits with a knapsack constraint. We then deﬁne an α-approximation unit-cost regret considering that submodular function maximization is NP-hard. To solve this problem, we propose two greedy algorithms based on a modiﬁed UCB rule. We then prove these two algorithms with different regret bounds and computational costs. We also conduct a number of experiments and the experimental results conﬁrm our theoretical analyses.

PDF Details

AAAI Conference 2016 Conference Paper

Submodular Asymmetric Feature Selection in Cascade Object Detection

Baosheng Yu
Meng Fang
Dacheng Tao
Jie Yin

A cascade classiﬁer has turned out to be effective in slidingwindow based real-time object detection. In a cascade classiﬁer, node learning is the key process, which includes feature selection and classiﬁer design. Previous algorithms fail to effectively tackle the asymmetry and intersection problems existing in cascade classiﬁcation, thereby limiting the performance of object detection. In this paper, we improve current feature selection algorithm by addressing both asymmetry and intersection problems. We formulate asymmetric feature selection as a submodular function maximization problem. We then propose a new algorithm SAFS with formal performance guarantee to solve this problem. We use face detection as a case study and perform experiments on two real-world face detection datasets. The experimental results demonstrate that our algorithm SAFS outperforms the state-of-art feature selection algorithms in cascade object detection, such as FFS and LACBoost.

PDF Details