Arrow Research search

Author name cluster

Pei Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

NeurIPS Conference 2025 Conference Paper

Bit-swapping Oriented Twin-memory Multi-view Clustering in Lifelong Incomplete Scenarios

  • Shengju Yu
  • Pei Zhang
  • Siwei Wang
  • Suyuan Liu
  • Xinhang Wan
  • Zhibin Dong
  • Tiejun Li
  • Xinwang Liu

Although receiving notable improvements, current multi-view clustering (MVC) techniques generally rely on feature library mechanisms to propagate accumulated knowledge from historical views to newly-arrived data, which overlooks the information pertaining to basis embedding within each view. Moreover, the mapping paradigm inevitably alters the values of learned landmarks and built affinities due to the uninterruption nature, accordingly disarraying the hierarchical cluster structures. To mitigate these two issues, we in the paper provide a named BSTM algorithm. Concretely, we firstly synchronize with the distinct dimensions by introducing a group of specialized projectors, and then establish unified anchors for all views collected so far to capture intrinsic patterns. Afterwards, departing from per-view architectures, we devise a shared bipartite graph construction via indicators to quantify similarity, which not only avoids redundant data-recalculations but alleviates the representation distortion caused by fusion. Crucially, there two components are optimized within an integrated framework, and collectively facilitate knowledge transfer upon encountering incoming views. Subsequently, to flexibly do transformation on anchors and meanwhile maintain numerical consistency, we develop a bit-swapping scheme operating exclusively on 0 and 1. It harmonizes anchors on current view and that on previous views through one-hot encoded row and column attributes, and the graph structures are correspondingly reordered to reach a matched configuration. Furthermore, a computationally efficient four-step updating strategy with linear complexity is designed to minimize the associated loss. Extensive experiments organized on publicly-available benchmark datasets with varying missing percentages confirm the superior effectiveness of our BSTM.

AAAI Conference 2025 Conference Paper

Max-Mahalanobis Anchors Guidance for Multi-View Clustering

  • Pei Zhang
  • Yuangang Pan
  • Siwei Wang
  • Shengju Yu
  • Huiying Xu
  • En Zhu
  • Xinwang Liu
  • Ivor Tsang

Anchor selection or learning has become a critical component in large-scale multi-view clustering. Existing anchor-based methods, which either select-then-fix or initialize-then-optimize with orthogonality, yield promising performance. However, these methods still suffer from instability of initialization or insufficient depiction of data distribution. Moreover, the desired properties of anchors in multi-view clustering remain unspecified. To address these issues, this paper first formalizes the desired characteristics of anchors, namely Diversity, Balance and Compactness. We then devise and mathematically validate anchors that satisfy these properties by maximizing the Mahalanobis distance between anchors. Furthermore, we introduce a novel method called Max-Mahalanobis Anchors Guidance for multi-view Clustering (MAGIC), which guides the cross-view representations to progressively align with our well-defined anchors. This process yields highly discriminative and compact representations, significantly enhancing the performance of multi-view clustering. Experimental results show that our meticulously designed strategy significantly outperforms existing anchor-based methods in enhancing anchor efficacy, leading to substantial improvement in multi-view clustering performance.

NeurIPS Conference 2025 Conference Paper

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

  • Yiming Wang
  • Pei Zhang
  • Jialong Tang
  • Hao-Ran Wei
  • Baosong Yang
  • Rui Wang
  • Chenshu Sun
  • Feitong Sun

In this paper, we introduce PolyMath, a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs. We conduct a comprehensive evaluation for advanced LLMs and find that even Qwen-3-235B-A22B-Thinking and Gemini-2. 5-pro, achieve only 54. 6 and 52. 2 benchmark scores, with about 40% accuracy under the highest level. From a language perspective, our benchmark reveals several key challenges of LLMs in multilingual reasoning: (1) Reasoning performance varies widely across languages for current LLMs; (2) Input-output language consistency is low in reasoning LLMs and may be correlated with performance; (3) The thinking length differs significantly by language for current LLMs. Additionally, we demonstrate that controlling the output language in the instructions has the potential to affect reasoning performance, especially for some low-resource languages, suggesting a promising direction for improving multilingual capabilities in LLMs.

NeurIPS Conference 2025 Conference Paper

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

  • Yiming Wang
  • Pei Zhang
  • Siyuan Huang
  • Baosong Yang
  • Zhuosheng Zhang
  • Fei Huang
  • Rui Wang

Test-time scaling enhances large language model performance by allocating additional compute resources during decoding. Best-of-$N$ (BoN) sampling serves as a common sampling-based scaling technique, broadening the search space in parallel to find better solutions from the model distribution. However, its cost–performance trade-off is still underexplored. Two main challenges limit the efficiency of BoN sampling: (1) Generating $N$ full samples consumes substantial GPU memory, reducing inference capacity under limited resources. (2) Reward models add extra memory and latency overhead, and training strong reward models introduces potential training data costs. Although some studies have explored efficiency improvements, none have addressed both challenges at once. To address this gap, we propose **Self-Truncation Best-of-$N$ (ST-BoN)**, a decoding method that avoids fully generating all $N$ samples and eliminates the need for reward models. It leverages early sampling consistency in the model’s internal states to identify the most promising path and truncate suboptimal ones. In terms of cost, ST-BoN reduces dynamic GPU memory usage by over 80% and inference latency by 50%. In terms of cost–performance trade-off, ST-BoN achieves the same performance as Full-BoN while saving computational cost by 70%–80%, and under the same cost, it can improve accuracy by 3–4 points.

JBHI Journal 2024 Journal Article

Collaborative Transfer Network for Multi-Classification of Breast Cancer Histopathological Images

  • Liangliang Liu
  • Ying Wang
  • Pei Zhang
  • Hongbo Qiao
  • Tong Sun
  • Hui Zhang
  • Xue Xu
  • Hongcai Shang

The incidence of breast cancer is increasing rapidly around the world. Accurate classification of the breast cancer subtype from hematoxylin and eosin images is the key to improve the precision of treatment. However, the high consistency of disease subtypes and uneven distribution of cancer cells seriously affect the performance of multi-classification methods. Furthermore, it is difficult to apply existing classification methods to multiple datasets. In this article, we propose a collaborative transfer network (CTransNet) for multi-classification of breast cancer histopathological images. CTransNet consists of a transfer learning backbone branch, a residual collaborative branch, and a feature fusion module. The transfer learning branch adopts the pre-trained DenseNet structure to extract image features from ImageNet. The residual branch extracts target features from pathological images in a collaborative manner. The feature fusion strategy of optimizing these two branches is used to train and fine-tune CTransNet. Experiments show that CTransNet achieves 98. 29% classification accuracy on the public BreaKHis breast cancer dataset, exceeding the performance of state-of-the-art methods. Visual analysis is carried out under the guidance of oncologists. Based on the training parameters of the BreaKHis dataset, CTransNet achieves superior performance on other two public breast cancer datasets (breast-cancer-grade-ICT and ICIAR2018_BACH_Challenge), indicating that CTransNet has good generalization performance.

AAAI Conference 2024 Conference Paper

DVSAI: Diverse View-Shared Anchors Based Incomplete Multi-View Clustering

  • Shengju Yu
  • Siwei Wang
  • Pei Zhang
  • Miao Wang
  • Ziming Wang
  • Zhe Liu
  • Liming Fang
  • En Zhu

In numerous real-world applications, it is quite common that sample information is partially available for some views due to machine breakdown or sensor failure, causing the problem of incomplete multi-view clustering (IMVC). While several IMVC approaches using view-shared anchors have successfully achieved pleasing performance improvement, (1) they generally construct anchors with only one dimension, which could deteriorate the multi-view diversity, bringing about serious information loss; (2) the constructed anchors are typically with a single size, which could not sufficiently characterize the distribution of the whole samples, leading to limited clustering performance. For generating view-shared anchors with multi-dimension and multi-size for IMVC, we design a novel framework called Diverse View-Shared Anchors based Incomplete multi-view clustering (DVSAI). Concretely, we associate each partial view with several potential spaces. In each space, we enable anchors to communicate among views and generate the view-shared anchors with space-specific dimension and size. Consequently, spaces with various scales make the generated view-shared anchors enjoy diverse dimensions and sizes. Subsequently, we devise an integration scheme with linear computational and memory expenditures to integrate the outputted multi-scale unified anchor graphs such that running spectral algorithm generates the spectral embedding. Afterwards, we theoretically demonstrate that DVSAI owns linear time and space costs, thus well-suited for tackling large-size datasets. Finally, comprehensive experiments confirm the effectiveness and advantages of DVSAI.

NeurIPS Conference 2024 Conference Paper

Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning

  • Yiming Wang
  • Pei Zhang
  • Baosong Yang
  • Derek F. Wong
  • Zhuosheng Zhang
  • Rui Wang

Real-world data deviating from the independent and identically distributed (\textit{i. i. d. }) assumption of in-distribution training data poses security threats to deep networks, thus advancing out-of-distribution (OOD) detection algorithms. Detection methods in generative language models (GLMs) mainly focus on uncertainty estimation and embedding distance measurement, with the latter proven to be most effective in traditional linguistic tasks like summarization and translation. However, another complex generative scenario mathematical reasoning poses significant challenges to embedding-based methods due to its high-density feature of output spaces, but this feature causes larger discrepancies in the embedding shift trajectory between different samples in latent spaces. Hence, we propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning. Experiments show that our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios and can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.

EAAI Journal 2024 Journal Article

Fusion flow-enhanced graph pooling residual networks for Unmanned Aerial Vehicles surveillance in day and night dual visions

  • Alam Noor
  • Kai Li
  • Eduardo Tovar
  • Pei Zhang
  • Bo Wei

Recognizing unauthorized Unmanned Aerial Vehicles (UAVs) within designated no-fly zones throughout the day and night is of paramount importance, where the unauthorized UAVs pose a substantial threat to both civil and military aviation safety. However, recognizing UAVs day and night with dual-vision cameras is nontrivial, since red–green–blue (RGB) images suffer from a low detection rate under an insufficient light condition, such as on cloudy or stormy days, while black-and-white infrared (IR) images struggle to capture UAVs that overlap with the background at night. In this paper, we propose a new optical flow-assisted graph-pooling residual network (OF-GPRN), which significantly enhances the UAV detection rate in day and night dual visions. The proposed OF-GPRN develops a new optical fusion to remove superfluous backgrounds, which improves RGB/IR imaging clarity. Furthermore, OF-GPRN extends optical fusion by incorporating a graph residual split attention network and a feature pyramid, which refines the perception of UAVs, leading to a higher success rate in UAV detection. A comprehensive performance evaluation is conducted using a benchmark UAV catch dataset. The results indicate that the proposed OF-GPRN elevates the UAV mean average precision (mAP) detection rate to 87. 8%, marking a 17. 9% advancement compared to the residual graph neural network (ResGCN)-based approach.

EAAI Journal 2024 Journal Article

Short-term high-speed rail passenger flow prediction by integrating ensemble empirical mode decomposition with multivariate grey support vector machine

  • Yujie Yuan
  • Xiushan Jiang
  • Pei Zhang
  • Chun Sing Lai

Short-term prediction of high-speed rail (HSR) passenger flow provides a daily ridership estimation for the near future, which is critical to HSR planning and operational decision making. This paper proposes a new methodology that integrates ensemble empirical mode decomposition with multivariate support vector machines (EEMD-MSVM). There are four steps in this hybrid forecasting approach: (i) explore the correlation of multivariate HSR passenger flows at various stations based on archived data; (ii) decompose empirical modes of historical passenger flows for each HSR station, using EEMD to generate a number of intrinsic mode functions (IMFs) and a trend term; (iii) predict the IMF for each correlated station pair using MSVM; and (iv) reconstruct the refined IMF components to predict daily multivariate HSR passenger flows. The proposed EEMD-MSVM approach is demonstrated with multiple OD pairs along the Wuhan-Guangzhou HSR in China. Results from various origin-destination pairs, show that the EEMD-MSVM approach outperforms the existing ensemble empirical mode decomposition with grey support vector machine approach (EEMD-GSVM). With the multivariate approach, the mean absolute percentage error in demand prediction is reduced by 13. 9%, 1. 2%, 1. 0%, 2. 0%, and 2. 7% and the mean absolute deviation is reduced by 78. 8, 38. 0, 4. 4, 4. 6, and 3. 9, between these OD pairs respectively. Such increase in short-term demand prediction accuracy can significantly improve HSR service planning, operations, and revenue management in the real world.

AAAI Conference 2023 Conference Paper

Graph Anomaly Detection via Multi-Scale Contrastive Learning Networks with Augmented View

  • Jingcan Duan
  • Siwei Wang
  • Pei Zhang
  • En Zhu
  • Jingtao Hu
  • Hu Jin
  • Yue Liu
  • Zhibin Dong

Graph anomaly detection (GAD) is a vital task in graph-based machine learning and has been widely applied in many real-world applications. The primary goal of GAD is to capture anomalous nodes from graph datasets, which evidently deviate from the majority of nodes. Recent methods have paid attention to various scales of contrastive strategies for GAD, i.e., node-subgraph and node-node contrasts. However, they neglect the subgraph-subgraph comparison information which the normal and abnormal subgraph pairs behave differently in terms of embeddings and structures in GAD, resulting in sub-optimal task performance. In this paper, we fulfill the above idea in the proposed multi-view multi-scale contrastive learning framework with subgraph-subgraph contrast for the first practice. To be specific, we regard the original input graph as the first view and generate the second view by graph augmentation with edge modifications. With the guidance of maximizing the similarity of the subgraph pairs, the proposed subgraph-subgraph contrast contributes to more robust subgraph embeddings despite of the structure variation. Moreover, the introduced subgraph-subgraph contrast cooperates well with the widely-adopted node-subgraph and node-node contrastive counterparts for mutual GAD performance promotions. Besides, we also conduct sufficient experiments to investigate the impact of different graph augmentation approaches on detection performance. The comprehensive experimental results well demonstrate the superiority of our method compared with the state-of-the-art approaches and the effectiveness of the multi-view subgraph pair contrastive strategy for the GAD task. The source code is released at https://github.com/FelixDJC/GRADATE.

JBHI Journal 2023 Journal Article

Hybrid Contextual Semantic Network for Accurate Segmentation and Detection of Small-Size Stroke Lesions From MRI

  • Liangliang Liu
  • Jing Chang
  • Zhihong Liu
  • Pei Zhang
  • Xue Xu
  • Hongcai Shang

Stroke is a cerebrovascular disease with high mortality and disability rates. The occurrence of the stroke typically produces lesions of different sizes, with the accurate segmentation and detection of small-size stroke lesions being closely related to the prognosis of patients. However, the large lesions are usually correctly identified, the small-size lesions are usually ignored. This article provides a hybrid contextual semantic network (HCSNet) that can accurately and simultaneously segment and detect small-size stroke lesions from magnetic resonance images. HCSNet inherits the advantages of the encoder–decoder architecture and applies a novel hybrid contextual semantic module that generates high-quality contextual semantic features from the spatial and channel contextual semantic features through the skip connection layer. Moreover, a mixing-loss function is proposed to optimize HCSNet for unbalanced small-size lesions. HCSNet is trained and evaluated on 2D magnetic resonance images produced from the Anatomical Tracings of Lesions After Stroke challenge (ATLAS R2. 0). Extensive experiments demonstrate that HCSNet outperforms several other state-of-the-art methods in its ability to segment and detect small-size stroke lesions. Visualization and ablation experiments reveal that the hybrid semantic module improves the segmentation and detection performance of HCSNet.

AAAI Conference 2023 Conference Paper

Let the Data Choose: Flexible and Diverse Anchor Graph Fusion for Scalable Multi-View Clustering

  • Pei Zhang
  • Siwei Wang
  • Liang Li
  • Changwang Zhang
  • Xinwang Liu
  • En Zhu
  • Zhe Liu
  • Lu Zhou

In the past few years, numerous multi-view graph clustering algorithms have been proposed to enhance the clustering performance by exploring information from multiple views. Despite the superior performance, the high time and space expenditures limit their scalability. Accordingly, anchor graph learning has been introduced to alleviate the computational complexity. However, existing approaches can be further improved by the following considerations: (i) Existing anchor-based methods share the same number of anchors across views. This strategy violates the diversity and flexibility of multi-view data distribution. (ii) Searching for the optimal anchor number within hyper-parameters takes much extra tuning time, which makes existing methods impractical. (iii) How to flexibly fuse multi-view anchor graphs of diverse sizes has not been well explored in existing literature. To address the above issues, we propose a novel anchor-based method termed Flexible and Diverse Anchor Graph Fusion for Scalable Multi-view Clustering (FDAGF) in this paper. Instead of manually tuning optimal anchor with massive hyper-parameters, we propose to optimize the contribution weights of a group of pre-defined anchor numbers to avoid extra time expenditure among views. Most importantly, we propose a novel hybrid fusion strategy for multi-size anchor graphs with theoretical proof, which allows flexible and diverse anchor graph fusion. Then, an efficient linear optimization algorithm is proposed to solve the resultant problem. Comprehensive experimental results demonstrate the effectiveness and efficiency of our proposed framework. The source code is available at https://github.com/Jeaninezpp/FDAGF.

JBHI Journal 2023 Journal Article

SASG-GCN: Self-Attention Similarity Guided Graph Convolutional Network for Multi-Type Lower-Grade Glioma Classification

  • Liangliang Liu
  • Jing Chang
  • Pei Zhang
  • Hongbo Qiao
  • Shufeng Xiong

Identifying the subtypes of low-grade glioma (LGG) can help prevent brain tumor progression and patient death. However, the complicated non-linear relationship and high dimensionality of 3D brain MRI limit the performance of machine learning methods. Therefore, it is important to develop a classification method that can overcome these limitations. This study proposes a self-attention similarity-guided graph convolutional network (SASG-GCN) that uses the constructed graphs to complete multi-classification (tumor-free (TF), WG, and TMG). In the pipeline of SASG-GCN, we use a convolutional deep belief network and a self-attention similarity-based method to construct the vertices and edges of the constructed graphs at 3D MRI level, respectively. The multi-classification experiment is performed in a two-layer GCN model. SASG-GCN is trained and evaluated on 402 3D MRI images which are produced from the TCGA-LGG dataset. Empirical tests demonstrate that SASG-GCN accurately classifies the subtypes of LGG. The accuracy of SASG-GCN achieves 93. 62%, outperforming several other state-of-the-art classification methods. In-depth discussion and analysis reveal that the self-attention similarity-guided strategy improves the performance of SASG-GCN. The visualization revealed differences between different gliomas.

AAAI Conference 2022 Conference Paper

Efficient One-Pass Multi-View Subspace Clustering with Consensus Anchors

  • Suyuan Liu
  • Siwei Wang
  • Pei Zhang
  • Kai Xu
  • Xinwang Liu
  • Changwang Zhang
  • Feng Gao

Multi-view subspace clustering (MVSC) optimally integrates multiple graph structure information to improve clustering performance. Recently, many anchor-based variants are proposed to reduce the computational complexity of MVSC. Though achieving considerable acceleration, we observe that most of them adopt fixed anchor points separating from the subsequential anchor graph construction, which may adversely affect the clustering performance. In addition, postprocessing is required to generate discrete clustering labels with additional time consumption. To address these issues, we propose a scalable and parameter-free MVSC method to directly output the clustering labels with optimal anchor graph, termed as Efficient One-pass Multi-view Subspace Clustering with Consensus Anchors (EOMSC-CA). Specially, we combine anchor learning and graph construction into a uniform framework to boost clustering performance. Meanwhile, by imposing a graph connectivity constraint, our algorithm directly outputs the clustering labels without any post-processing procedures as previous methods do. Our proposed EOMSC-CA is proven to be linear complexity respecting to the data size. The superiority of our EOMSC-CA over the effectiveness and efficiency is demonstrated by extensive experiments. Our code is publicly available at https: //github. com/Tracesource/EOMSC-CA.

ICLR Conference 2022 Conference Paper

PI3NN: Out-of-distribution-aware Prediction Intervals from Three Neural Networks

  • Siyan Liu
  • Pei Zhang
  • Dan Lu 0001
  • Guannan Zhang

We propose a novel prediction interval (PI) method for uncertainty quantification, which addresses three major issues with the state-of-the-art PI methods. First, existing PI methods require retraining of neural networks (NNs) for every given confidence level and suffer from the crossing issue in calculating multiple PIs. Second, they usually rely on customized loss functions with extra sensitive hyperparameters for which fine tuning is required to achieve a well-calibrated PI. Third, they usually underestimate uncertainties of out-of-distribution (OOD) samples leading to over-confident PIs. Our PI3NN method calculates PIs from linear combinations of three NNs, each of which is independently trained using the standard mean squared error loss. The coefficients of the linear combinations are computed using root-finding algorithms to ensure tight PIs for a given confidence level. We theoretically prove that PI3NN can calculate PIs for a series of confidence levels without retraining NNs and it completely avoids the crossing issue. Additionally, PI3NN does not introduce any unusual hyperparameters resulting in a stable performance. Furthermore, we address OOD identification challenge by introducing an initialization scheme which provides reasonably larger PIs of the OOD samples than those of the in-distribution samples. Benchmark and real-world experiments show that our method outperforms several state-of-the-art approaches with respect to predictive uncertainty quality, robustness, and OOD samples identification.

ECAI Conference 2020 Conference Paper

Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation

  • Pei Zhang
  • Xu Zhang
  • Wei Chen 0071
  • Jian Yu
  • Yanfeng Wang
  • Deyi Xiong

Document-level machine translation incorporates intersentential dependencies into the translation of a source sentence. In this paper, we propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence. By enforcing the NMT model to predict source context, we want the model to learn “contextualized” source sentence representations that capture document-level dependencies on the source side. We further propose two different methods to learn and integrate such contextualized sentence embeddings into NMT: a joint training method that jointly trains an NMT model with the source context prediction model and a pre-training & fine-tuning method that pretrains the source context prediction model on a large-scale monolingual document corpus and then fine-tunes it with the NMT model. Experiments on Chinese-English and English-German translation show that both methods can substantially improve the translation quality over a strong document-level Transformer baseline.

AAAI Conference 2020 Conference Paper

Visual Agreement Regularized Training for Multi-Modal Machine Translation

  • Pengcheng Yang
  • Boxing Chen
  • Pei Zhang
  • Xu Sun

Multi-modal machine translation aims at translating the source sentence into a different language in the presence of the paired image. Previous work suggests that additional visual information only provides dispensable help to translation, which is needed in several very special cases such as translating ambiguous words. To make better use of visual information, this work presents visual agreement regularized training. The proposed approach jointly trains the source-totarget and target-to-source translation models and encourages them to share the same focus on the visual information when generating semantically equivalent visual words (e. g. “ball” in English and “ballon” in French). Besides, a simple yet effective multi-head co-attention model is also introduced to capture interactions between visual and textual features. The results show that our approaches can outperform competitive baselines by a large margin on the Multi30k dataset. Further analysis demonstrates that the proposed regularized training can effectively improve the agreement of attention on the image, leading to better use of visual information.