Mining High Average Utility Nonoverlapping Patterns from Sequential Database

Meng Geng; Youxi Wu; Yan Li; Jing Liu; Lei Guo; Xingquan Zhu; Xindong Wu

doi:10.1145/3773899

Back to TIST

TIST 2026

Mining High Average Utility Nonoverlapping Patterns from Sequential Database

Journal Article journal-article Artificial Intelligence · Intelligent Systems

Details DOI

Abstract

As a crucial aspect of data mining, high average utility sequential pattern mining (SPM) aims to discover low frequency and high average utility patterns (subsequences) in sequence data. Most existing high average utility SPM methods overlook the repetitive occurrences of patterns in each sequence, resulting in some important patterns being ignored. To address this issue, we focus on the problem of mining high average utility nonoverlapping patterns (HUPs) from sequential database, and propose an HUP-Miner algorithm. To reduce the need for repeated scanning of the original database, we use a position dictionary to record the occurrence information of each item. To reduce the number of candidate patterns generated, we adopt a pattern join strategy and explore four pruning strategies. To efficiently calculate the average utility of a pattern, we propose an SPC algorithm that utilizes the occurrence positions of sub-patterns. When compared with 12 competitive algorithms, the experimental results on 14 databases show that HUP-Miner gives superior results. Furthermore, we use information gain as the utility for each item, and find that the HUPs discovered in this way can generate better performance via a clustering analysis. All of the algorithms and databases used here are available from https://github.com/wuc567/Pattern-Mining/tree/master/HUP-Miner.

Mining High Average Utility Nonoverlapping Patterns from Sequential Database

Abstract

Authors

Keywords

Context