Arrow Research search

Author name cluster

Wei Dai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

AAAI Conference 2026 Conference Paper

Compositional Attribute Imbalance in Vision Datasets

  • Yanbiao Ma
  • Jiayi Chen
  • Wei Dai
  • Dong Zhao
  • Zeyu Zhang
  • Yuting Yang
  • Bowei Liu
  • Jiaxuan Zhao

Visual attribute imbalance is a common yet underexplored issue in image classification, significantly impacting model performance and generalization. In this work, we first define the first-level and second-level attributes of images and then introduce a CLIP-based framework to construct a visual attribute dictionary, enabling automatic evaluation of image attributes. By systematically analyzing both single-attribute imbalance and compositional attribute imbalance, we reveal how the rarity of attributes affects model performance. To tackle these challenges, we propose adjusting the sampling probability of samples based on the rarity of their compositional attributes. This strategy is further integrated with various data augmentation techniques (such as CutMix, Fmix, and SaliencyMix) to enhance the model's ability to represent rare attributes. Extensive experiments on benchmark datasets demonstrate that our method effectively mitigates attribute imbalance, thereby improving the robustness and fairness of deep neural networks. Our research highlights the importance of modeling visual attribute distributions and provides a scalable solution for long-tail image classification tasks.

AAAI Conference 2026 Conference Paper

Stability-Aware Reinforcement Learning for Robust Class Integration Test Order Generation

  • Yanru Ding
  • Yanmei Zhang
  • Guan Yuan
  • Shujuan Jiang
  • Wei Dai
  • Luciano Baresi

Generating a class integration test order (CITO) is essential to reduce the overhead of test stub construction (the primary cost in integration testing) and to ensure system reliability in complex software systems. Although reinforcement learning (RL) has shown promise in automating CITO generation, existing methods suffer from unstable policy learning and limited robustness against structural perturbations and defect injection. These challenges stem from insufficient reward shaping and the lack of reliable oracles for validation. To address these limitations, we propose LM-CITO, a stability-aware RL framework that integrates Lyapunov-guided reward shaping with semantic validation through metamorphic testing (MT). Specifically, we design a Lyapunov energy function over class dependency graphs to promote monotonic structural convergence during training, and define metamorphic relations (MRs) to verify behavioral consistency under controlled perturbations. Extensive experiments on six real-world systems demonstrate that LM-CITO consistently produces more effective policies, yielding CITOs with significantly reduced stubbing costs compared to baseline models. Furthermore, MT verifies the capability of our MRs to detect defects in 19 injected bug variants, confirming the robustness of LM-CITO under various fault-induced perturbations. These results highlight the synergy of stability guidance and MR-based validation, offering an effective, principled solution for oracle-free RL in software testing.

ICLR Conference 2025 Conference Paper

AutoUAD: Hyper-parameter Optimization for Unsupervised Anomaly Detection

  • Wei Dai
  • Jicong Fan 0001

Unsupervised anomaly detection (UAD) has important applications in diverse fields such as manufacturing industry and medical diagnosis. In the past decades, although numerous insightful and effective UAD methods have been proposed, it remains a huge challenge to tune the hyper-parameters of each method and select the most appropriate method among many candidates for a specific dataset, due to the absence of labeled anomalies in the training phase of UAD methods and the high diversity of real datasets. In this work, we aim to address this challenge, so as to make UAD more practical and reliable. We propose two internal evaluation metrics, relative-top-median and expected-anomaly-gap, and one semi-internal evaluation metric, normalized pseudo discrepancy (NPD), as surrogate functions of the expected model performance on unseen test data. For instance, NPD measures the discrepancy between the anomaly scores of a validation set drawn from the training data and a validation set drawn from an isotropic Gaussian. NPD is simple and hyper-parameter-free and is able to compare different UAD methods, and its effectiveness is theoretically analyzed. We integrate the three metrics with Bayesian optimization to effectively optimize the hyper-parameters of UAD models. Extensive experiments on 38 datasets show the effectiveness of our methods.

ICML Conference 2025 Conference Paper

Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

  • Haoqi Wu
  • Wei Dai
  • Li Wang
  • Qiang Yan

Large Language Models (LLMs) have gained significant popularity due to their remarkable capabilities in text understanding and generation. However, despite their widespread deployment in inference services such as ChatGPT, concerns about the potential leakage of sensitive user data have arisen. Existing solutions primarily rely on privacy-enhancing technologies to mitigate such risks, facing the trade-off among efficiency, privacy, and utility. To narrow this gap, we propose Cape, a context-aware prompt perturbation mechanism based on differential privacy, to enable efficient inference with an improved privacy-utility trade-off. Concretely, we introduce a hybrid utility function that better captures the token similarity. Additionally, we propose a bucketized sampling mechanism to handle large sampling space, which might lead to long-tail phenomenons. Extensive experiments across multiple datasets, along with ablation studies, demonstrate that Cape achieves a better privacy-utility trade-off compared to prior state-of-the-art works.

ICML Conference 2025 Conference Paper

CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective

  • Jiayu Liu 0001
  • Zhenya Huang
  • Wei Dai
  • Cheng Cheng
  • Jinze Wu
  • Jing Sha
  • Song Li
  • Qi Liu 0003

Although large language models (LLMs) show promise in solving complex mathematical tasks, existing evaluation paradigms rely solely on a coarse measure of overall answer accuracy, which are insufficient for assessing their authentic capabilities. In this paper, we propose CogMath, which comprehensively assesses LLMs’ mathematical abilities through the lens of human cognition. Specifically, inspired by psychological theories, CogMath formalizes human reasoning process into 3 stages: problem comprehension, problem solving, and solution summarization. Within these stages, we investigate perspectives such as numerical calculation, knowledge, and counterfactuals, and design a total of 9 fine-grained evaluation dimensions. In each dimension, we develop an “ Inquiry - Judge - Reference ” multi-agent system to generate inquiries that assess LLMs’ mastery from this dimension. An LLM is considered to truly master a problem only when excelling in all inquiries from the 9 dimensions. By applying CogMath on three benchmarks, we reveal that the mathematical capabilities of 7 mainstream LLMs are overestimated by 30%-40%. Moreover, we locate their strengths and weaknesses across specific stages/dimensions, offering in-depth insights to further enhance their reasoning abilities.

JBHI Journal 2025 Journal Article

Hierarchical Graph Representation Learning With Multi-Granularity Features for Anti-Cancer Drug Response Prediction

  • Wei Peng
  • Jiangzhen Lin
  • Wei Dai
  • Ning Yu
  • Jianxin Wang

Patients with the same type of cancer often respond differently to identical drug treatments due to unique genomic traits. Accurately predicting a patient's response to drug is crucial in guiding treatment decisions, alleviating patient suffering, and improving cancer prognosis. Current computational methods utilize deep learning models trained on extensive drug screening data to predict anti-cancer drug responses based on features of cell lines and drugs. However, the interaction between cell lines and drugs is a complex biological process involving interactions across various levels, from internal cellular and drug structures to the external interactions among different molecules. To address this complexity, we propose a novel Hierarchical graph representation Learning with Multi-Granularity features (HLMG) algorithm for predicting anti-cancer drug responses. The HLMG algorithm combines features at two granularities: the overall gene expression and pathway substructures of cell lines, and the overall molecular fingerprints and substructures of drugs. Subsequently, it constructs a heterogeneous graph including cell lines, drugs, known cell line-drug responses, and the associations between similar cell lines and similar drugs. Through a graph convolutional network model, the HLMG learns the final cell line and drug representations by aggregating features of their multi-level neighbor in the heterogeneous graph. The multi-level neighbors consist of the node self, directly related drugs/cell lines, and indirectly related similar drugs/cell lines. Finally, a linear correlation coefficient decoder is employed to reconstruct the cell line-drug correlation matrix to predict anti-cancer drug responses. Our model was tested on the Genomics of Drug Sensitivity in Cancer (GDSC) and the Cancer Cell Line Encyclopedia (CCLE) databases. Results indicate that HLMG outperforms other state-of-the-art methods in accurately predicting anti-cancer drug responses.

NeurIPS Conference 2025 Conference Paper

ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation

  • Haoqi Wu
  • Wei Dai
  • Ming Xu
  • Wang Li
  • Qiang Yan

Diffusion Models have gained significant popularity due to their remarkable capabilities in image generation, albeit at the cost of intensive computation requirement. Meanwhile, despite their widespread deployment in inference services such as Midjourney, concerns about the potential leakage of sensitive information in uploaded user prompts have arisen. Existing solutions either fail to strike an effective balance between utility and efficiency, or lack rigorous privacy guarantees. To bridge this gap, we propose ObCLIP, a plug-and-play safeguard that enables oblivious cloud-device hybrid generation scheme. By oblivious, each input prompt is transformed into a set of semantically similar candidate prompts that differ only in sensitive attributes (e. g. , gender, ethnicity). The cloud server processes all candidate prompts without knowing which one is the real one, thus preventing any prompt leakage. To mitigate server cost, only a small portion of denoising steps is performed upon the large cloud model. The resulting intermediate latents are then transmitted back to the device, which selects the targeted latent and completes the remaining denoising using a small local model to obtain the final image. Additionally, we analyze and incorporate several cache-based accelerations that leverage temporal and batch redundancy, effectively reducing computation cost with minimal utility degradation. Extensive experiments across multiple datasets demonstrate that ObCLIP provides rigorous privacy and comparable utility to large cloud models with slightly increased server computation.

JBHI Journal 2025 Journal Article

Predicting Clinical Anticancer Drug Response of Patients by Using Domain Alignment and Prototypical Learning

  • Wei Peng
  • Chuyue Chen
  • Wei Dai
  • Ning Yu
  • Jianxin Wang

Anticancer drug response prediction is crucial in developing personalized treatment plans for cancer patients. However, High-quality patient anticancer drug response data are scarce and cell line data and patient data have different distributions, models trained solely on cell line data perform poorly. Some existing methods predict anticancer drug response by transferring knowledge from the cell line domain to the patient domain using transfer learning. However, the robustness of these classifiers is affected by anomalies in the cell line data, and they do not utilize the knowledge in the unlabeled target domain data. To this end, we proposed a model called DAPL to predict patient responses to anticancer drugs. The model extracts domain-invariant features from cell lines and patients by constructing multiple VAEs and extracts drug features using GNNs. These features are then combined for prototypical learning to train a classifier, resulting in better predictions of patient anticancer drug response. We used the cell line datasets CCLE and GDSC as source domains and the patient datasets TCGA and PDTC as target domains and conducted experiments. The results indicate that DAPL shows excellent performance in predicting patient anticancer drug response compared to other state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

The Dual Nature of Plasticity Loss in Deep Continual Learning: Dissection and Mitigation

  • Haoyu Wang
  • Wei Dai
  • Jiawei Zhang
  • Jialun Ma
  • Mingyi Huang
  • Yuguo Yu

Loss of plasticity (LoP) is the primary cause of cognitive decline in normal aging brains next to cell loss. Recent works show that similar LoP also plagues neural networks during deep continual learning (DCL). While it has been shown that random perturbations of learned weights can alleviate LoP, its underlying mechanisms remain insufficiently understood. Here we offer a unique view of LoP and dissect its mechanisms through the lenses of an innovative framework combining the theory of neural collapse and finite-time Lyapunov exponents (FTLE) analysis. We show that LoP actually consists of two contrasting types: (i) type-1 LoP is characterized by highly negative FTLEs, where the network is prevented from learning due to the collapse of representations; (ii) while type-2 LoP is characterized by excessively positive FTLEs, where the network can train well but the growingly chaotic behaviors reduce its test accuracy. Based on these understandings, we introduce Generalized Mixup, designed to relax the representation space for prolonged DCL and demonstrate its superior efficacy vs. existing methods.

AAAI Conference 2025 Conference Paper

Unsupervised Anomaly Detection for Tabular Data Using Deep Noise Evaluation

  • Wei Dai
  • Kai Hwang
  • Jicong Fan

Unsupervised anomaly detection (UAD) plays an important role in modern data analytics and it is crucial to provide simple yet effective and guaranteed UAD algorithms for real applications. In this paper, we present a novel UAD method for tabular data by evaluating how much noise is in the data. Specifically, we propose to learn a deep neural network from the clean (normal) training dataset and a noisy dataset, where the latter is generated by adding highly diverse noises to the clean data. The neural network can learn a reliable decision boundary between normal data and anomalous data when the diversity of the generated noisy data is sufficiently high so that the hard abnormal samples lie in the noisy region. Importantly, we provide theoretical guarantees, proving that the proposed method can detect anomalous data successfully, although the method does not utilize any real anomalous data in the training stage. Extensive experiments through more than 60 benchmark datasets demonstrate the effectiveness of the proposed method in comparison to 12 baselines of UAD. Our method obtains a 92.27% AUC score and a 1.68 ranking score on average. Moreover, compared to the state-of-the-art UAD methods, our method is easier to implement.

ICRA Conference 2024 Conference Paper

Automated Non-invasive Analysis of Motile Sperms Using Cross-scale Guidance Network

  • Wei Dai
  • Zixuan Wu
  • Jiaqi Wang
  • Rui Liu 0033
  • Min Wang 0032
  • Tianyi Wu
  • Junxian Zhou
  • Zhuoran Zhang 0001

Unbiased measurement of sperm morphometric and motility parameters is essential for assessing fertility potential and guiding visual feedback for microrobotic manipulation. Automated analysis of multiple sperms and selection of an optimal sperm is crucial for in vitro fertilisation treatment such as robotic intracytoplasmic sperm injection. However, conventional image processing methods have limitations in analysing small sperm objects under microscopic imaging. The emergence of convolutional neural networks (CNNs) has offered promising advancements in microscopic image analysis. However, previous CNN methods have struggled to accurately segment tiny objects, requiring staining or fluorescence techniques to enhance visual contrast between sperm and culture medium, leading to clinical impracticality. To address these limitations, we introduce a novel segmentation network named the cross-scale guidance (CSG) network for accurate and efficient segmentation of minute sperm objects. The CSG network employs innovative modules, including collateral multi-scale convolution, cross-scale feature map guide, and multi-scale feature fusion, to preserve essential sperm details despite their small size. Experimental results indicate that the CSG network surpassed the state-of-the-art models designed for small object segmentation, achieving a significant increase up to 18. 62% higher mean intersection over union (mIoU). Additionally, the CSG network excelled in sperm morphometric analysis, achieving errors below 20%. Moreover, sperm motility parameters were further derived from the segmentation results for comprehensive sperm fertility analysis.

JBHI Journal 2024 Journal Article

Deep Learning-Based Microscopic Cell Detection Using Inverse Distance Transform and Auxiliary Counting

  • Rui Liu
  • Wei Dai
  • Cong Wu
  • Tianyi Wu
  • Min Wang
  • Junxian Zhou
  • Xiaozhen Zhang
  • Wen Jung Li

Microscopic cell detection is a challenging task due to significant inter-cell occlusions in dense clusters and diverse cell morphologies. This paper introduces a novel framework designed to enhance automated cell detection. The proposed approach integrates a deep learning model that produces an inverse distance transform-based detection map from the given image, accompanied by a secondary network designed to regress a cell density map from the same input. The inverse distance transform-based map effectively highlights each cell instance in the densely populated areas, while the density map accurately estimates the total cell count in the image. Then, a custom counting-aided cell center extraction strategy leverages the cell count obtained by integrating over the density map to refine the detection process, significantly reducing false responses and thereby boosting overall accuracy. The proposed framework demonstrated superior performance with F-scores of 96. 93%, 91. 21%, and 92. 00% on the VGG, MBM, and ADI datasets, respectively, surpassing existing state-of-the-art methods. It also achieved the lowest distance error, further validating the effectiveness of the proposed approach. These results demonstrate significant potential for automated cell analysis in biomedical applications.

JBHI Journal 2024 Journal Article

Deeply Supervised Skin Lesions Diagnosis With Stage and Branch Attention

  • Wei Dai
  • Rui Liu
  • Tianyi Wu
  • Min Wang
  • Jianqin Yin
  • Jun Liu

Accurate and unbiased examinations of skin lesions are critical for the early diagnosis and treatment of skin diseases. Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment. Recent studies have reported that ensembled convolutional neural networks (CNNs) are practical to classify the images for early diagnosis of skin disorders. However, the practical use of these ensembled CNNs is limited as these networks are heavyweight and inadequate for processing contextual information. Although lightweight networks (e. g. , MobileNetV3 and EfficientNet) were developed to achieve parameter reduction for implementing deep neural networks on mobile devices, insufficient depth of feature representation restricts the performance. To address the existing limitations, we develop a new lite and effective neural network, namely HierAttn. The HierAttn applies a novel deep supervision strategy to learn the local and global features by using multi-stage and multi-branch attention mechanisms with only one training loss. The efficacy of HierAttn was evaluated by using the dermoscopy images dataset ISIC2019 and smartphone photos dataset PAD-UFES-20 (PAD2020). The experimental results show that HierAttn achieves the best accuracy and area under the curve (AUC) among the state-of-the-art lightweight networks.

NeurIPS Conference 2022 Conference Paper

Brain Network Transformer

  • Xuan Kan
  • Wei Dai
  • Hejie Cui
  • Zilong Zhang
  • Ying Guo
  • Carl Yang

Human brains are commonly modeled as networks of Regions of Interest (ROIs) and their connections for the understanding of brain functions and mental disorders. Recently, Transformer-based models have been studied over different types of data, including graphs, shown to bring performance gains widely. In this work, we study Transformer-based models for brain network analysis. Driven by the unique properties of data, we model brain networks as graphs with nodes of fixed size and order, which allows us to (1) use connection profiles as node features to provide natural and low-cost positional information and (2) learn pair-wise connection strengths among ROIs with efficient attention weights across individuals that are predictive towards downstream analysis tasks. Moreover, we propose an Orthonormal Clustering Readout operation based on self-supervised soft clustering and orthonormal projection. This design accounts for the underlying functional modules that determine similar behaviors among groups of ROIs, leading to distinguishable cluster-aware node embeddings and informative graph embeddings. Finally, we re-standardize the evaluation pipeline on the only one publicly available large-scale brain network dataset of ABIDE, to enable meaningful comparison of different models. Experiment results show clear improvements of our proposed Brain Network Transformer on both the public ABIDE and our restricted ABCD datasets. The implementation is available at https: //github. com/Wayfear/BrainNetworkTransformer.

JBHI Journal 2022 Journal Article

Predicting Drug Response Based on Multi-Omics Fusion and Graph Convolution

  • Wei Peng
  • Tielin Chen
  • Wei Dai

Different cancer patients may respond differently to cancer treatment due to the heterogeneity of cancer. It is an urgent task to develop an efficient computational method to identify drug responses in different cell lines, which guides us to design personalized therapy for an individual patient. Hence, we propose an end-to-end algorithm, namely MOFGCN, to predict drug response in cell lines based on Multi-Omics Fusion and Graph Convolution Network. MOFGCN first fuses multiple omics data to calculate the cell line similarity and then constructs a heterogeneous network by combining the cell line similarity, drug similarity, and the known cell line-drug associations. Secondly, it learns the latent features for cancer cell lines and drugs by performing graph convolution operations on the heterogeneous network. Finally, MOFGCN applies the linear correlation coefficient to reconstruct the cancer cell line-drug correlation matrix to predict drug sensitivity. To our knowledge, this is the first attempt to combine graph convolutional neural network and linear correlation coefficient for this significant task. We performed extensive evaluation experiments on the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) databases to validate MOFGCN’s performance. The experimental results show that MOFGCN is superior to the state-of-the-art algorithms in predicting missing drug responses. It also leads to higher performance in predicting drug responses for new cell lines, new drugs, and targeted drugs.

ICML Conference 2020 Conference Paper

Learning Optimal Tree Models under Beam Search

  • Jingwei Zhuo
  • Ziru Xu
  • Wei Dai
  • Han Zhu 0001
  • Han Li 0005
  • Jian Xu 0015
  • Kun Gai

Retrieving relevant targets from an extremely large target set under computational limits is a common challenge for information retrieval and recommendation systems. Tree models, which formulate targets as leaves of a tree with trainable node-wise scorers, have attracted a lot of interests in tackling this challenge due to their logarithmic computational complexity in both training and testing. Tree-based deep models (TDMs) and probabilistic label trees (PLTs) are two representative kinds of them. Though achieving many practical successes, existing tree models suffer from the training-testing discrepancy, where the retrieval performance deterioration caused by beam search in testing is not considered in training. This leads to an intrinsic gap between the most relevant targets and those retrieved by beam search with even the optimally trained node-wise scorers. We take a first step towards understanding and analyzing this problem theoretically, and develop the concept of Bayes optimality under beam search and calibration under beam search as general analyzing tools for this purpose. Moreover, to eliminate the discrepancy, we propose a novel algorithm for learning optimal tree models under beam search. Experiments on both synthetic and real data verify the rationality of our theoretical analysis and demonstrate the superiority of our algorithm compared to state-of-the-art methods.

JMLR Journal 2018 Journal Article

Distributed Proximal Gradient Algorithm for Partially Asynchronous Computer Clusters

  • Yi Zhou
  • Yingbin Liang
  • Yaoliang Yu
  • Wei Dai
  • Eric P. Xing

With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile distributed algorithm has become vital for the success of many large-scale machine learning applications. In this work we propose m-PAPG, an implementation of the flexible proximal gradient algorithm in model parallel systems equipped with the partially asynchronous communication protocol. The worker machines communicate asynchronously with a controlled staleness bound $s$ and operate at different frequencies. We characterize various convergence properties of m-PAPG: 1) Under a general non-smooth and non-convex setting, we prove that every limit point of the sequence generated by m-PAPG is a critical point of the objective function; 2) Under an error bound condition of convex objective functions, we prove that the optimality gap decays linearly for every $s$ steps; 3) Under the Kurdyka-Łojasiewicz inequality and a sufficient decrease assumption, we prove that the sequences generated by m-PAPG converge to the same critical point, provided that a proximal Lipschitz condition is satisfied. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

AAAI Conference 2015 Conference Paper

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

  • Wei Dai
  • Abhimanu Kumar
  • Jinliang Wei
  • Qirong Ho
  • Garth Gibson
  • Eric Xing

As Machine Learning (ML) applications embrace greater data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Effective use of clusters for ML programs requires considerable expertise in writing distributed code, but existing highlyabstracted frameworks like Hadoop that pose low barriers to distributed-programming have not, in practice, matched the performance seen in highly specialized and advanced ML implementations. The recent Parameter Server (PS) paradigm is a middle ground between these extremes, allowing easy conversion of single-machine parallel ML programs into distributed ones, while maintaining high throughput through relaxed “consistency models” that allow asynchronous (and, hence, inconsistent) parameter reads. However, due to insufficient theoretical study, it is not clear which of these consistency models can really ensure correct ML algorithm output; at the same time, there remain many theoreticallymotivated but undiscovered opportunities to maximize computational throughput. Inspired by this challenge, we study both the theoretical guarantees and empirical behavior of iterative-convergent ML algorithms in existing PS consistency models. We then use the gleaned insights to improve a consistency model using an “eager” PS communication mechanism, and implement it as a new PS system that enables ML programs to reach their solution more quickly.

IROS Conference 2013 Conference Paper

Functional analysis of grasping motion

  • Wei Dai
  • Yu Sun 0004
  • Xiaoning Qian

This paper presents a novel grasping motion analysis technique based on functional principal component analysis (fPCA). The functional analysis of grasping motion provides an effective representation of grasping motion and emphasizes motion dynamic features that are omitted by classic PCA-based approaches. The proposed approach represents, processes, and compares grasping motion trajectories in a low-dimensional space. An experiment was conducted to record grasping motion trajectories of 15 different grasp types in Cutkosky grasp taxonomy. We implemented our method for the analysis of collected grasping motion in the PCA+fPCA space, which generated a new data-driven taxonomy of the grasp types, and naturally clustered grasping motion into 5 consistent groups across 5 different subjects. The robustness of the grouping was evaluated and confirmed using a tenfold cross validation approach.