Arrow Research search
Back to TIST

TIST 2026

Mining High Average Utility Nonoverlapping Patterns from Sequential Database

Journal Article journal-article Artificial Intelligence ยท Intelligent Systems

Abstract

As a crucial aspect of data mining, high average utility sequential pattern mining (SPM) aims to discover low frequency and high average utility patterns (subsequences) in sequence data. Most existing high average utility SPM methods overlook the repetitive occurrences of patterns in each sequence, resulting in some important patterns being ignored. To address this issue, we focus on the problem of mining high average utility nonoverlapping patterns (HUPs) from sequential database, and propose an HUP-Miner algorithm. To reduce the need for repeated scanning of the original database, we use a position dictionary to record the occurrence information of each item. To reduce the number of candidate patterns generated, we adopt a pattern join strategy and explore four pruning strategies. To efficiently calculate the average utility of a pattern, we propose an SPC algorithm that utilizes the occurrence positions of sub-patterns. When compared with 12 competitive algorithms, the experimental results on 14 databases show that HUP-Miner gives superior results. Furthermore, we use information gain as the utility for each item, and find that the HUPs discovered in this way can generate better performance via a clustering analysis. All of the algorithms and databases used here are available from https://github.com/wuc567/Pattern-Mining/tree/master/HUP-Miner.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
ACM Transactions on Intelligent Systems and Technology
Archive span
2010-2026
Indexed papers
1415
Paper id
1038391303665225190