AAAI Conference 1997 Conference Paper
Pattern Discovery in Distributed Databases
- Raj Bhatnagar
Most algorithms for learning and pattern discovery in data assume that all the needed data is available on one computer at a single site. This assumption does not hold in situations where a number of independent databases reside on geographically distributed nodes of a computer network. These databases cannot be moved to a single site due to size, security, privacy and data-ownership concerns but all of them together constitute the dataset in which patterns must be discovered. Some pattern discovery algorithms can be adapted to such situations &nd some others become inefficient or inapplicable. In this paper we show how a decision-tree induction algorithm may be adapted for distributed data situations. We also discuss some general issues relating to the adaptability of other pattern discovery algorithms to distributed data situations