|
|
2015 » Papers » Volume 1 » Advances in Knowledge Discovery in Distributed Databases 1. ADVANCES IN KNOWLEDGE DISCOVERY IN DISTRIBUTED DATABASES Authors: Pupezescu Valentin Volume 1 | DOI: 10.12753/2066-026X-15-046 | Pages: 311-319 | Download PDF | Abstract
The Knowledge Discovery in Distributed Databases is the process of extracting useful information from a collection of data stored in distributed databases. A distributed database is a collection of data replicated over a number of different computers. The best suited structures for working with distributed databases are the Distributed Committee-Machines. Distributed Committee-Machines are a combination of neural networks that work in a distributed manner as a group in order to obtain better performance than individual neural networks in solving data mining tasks inside the KDD process. In this paper I aim to study the interaction between Distributed Committee-Machines and distributed databases. The process of replication on multiple machines can become very slow once the number of the machines from the replication topology grows. Such behaviour is explicable because of the complex software that is used in real implementations of the replication process in order to make available the same data on multiple machines. Because of this situation, working with Distributed Committee-Machines in a distributed environment can be very problematic. In this paper I propose a design that overcomes those disadvantages and a new type of approach in storing the neural networks. The developed system stores the entire neural network in a real relational databases. All the neural structures that were used were multilayer perceptrons trained with the backpropagation algorithm.
The non-optimized architecture has the disadvantage of writing all the results on the master system. All the data (TR - training set, TS - testing set, results) are duplicated on each slave machine. This working methods it's time consuming because of the internal functioning of the replication process. With this system, the distributed speedup is below one.
The optimized DCM structure eliminates the problems inherited from replication by writing all the result locally in special tables that won't be replicated on all the distributed machines. Whenever the neural networks finds an optimal result, the neural network will write in the database all the important parameters. Here I used also a new approach which consists of storing the entire neural network in the table as BLOB (Binary Large Object) object. The method can be beneficial also in new types of eLearning technics such as the adaptive eLearning method that uses neural networks.
With the optimized design of DCM structures the speedup in all the experiments is almost equal with the number of distributed machines that were used. | Keywords
Knowledge Discovery in Distributed Databases, Data Mining, Distributed Databases, Artificial Intelligence, Distributed-Committee Machines, Neural Networks |
|
|
|