USING THE MINIMUM SPANNING TREE PROBLEM FOR DATA MINING
DOI:
https://doi.org/10.32782/tnv-tech.2024.1.5Keywords:
graph, minimum spanning tree,Abstract
The article is devoted to an attempt to determine clusters using the optimization problem of a minimum spanning tree. The problem of the clustering problem is to group similar objects into many clusters in such a way that objects within one cluster are similar to each other, and objects between different clusters are different. One approach to solving this problem is to use the minimum spanning tree problem. The idea of using the minimum spanning tree problem for clustering is to build a tree connecting all the objects, where the weights of the edges correspond to the distances between the objects. Then, using some pruning technique, a certain number of edges can be removed to divide the tree into clusters. To do this, it is necessary to remove the longest edges in order to divide the tree into individual components corresponding to clusters. This approach can be especially effective in cases where it is necessary to minimize overlaps: a minimum spanning tree combines vertices with the smallest distances, that is, the vertices that are most similar to each other will be located next to each other. This helps reduce the number of matches in clusters and improve the quality of clustering. A minimum spanning tree can be easily interpreted because it shows the closest connections between vertices, which allows us to understand which groups of data tend to cluster together. It can also be useful in cases where we are dealing with large amounts of data, since minimum spanning tree algorithms have low computational complexity. Minimum Spanning Tree allows you to ignore noise in the data because it builds a connection between nearby points and noisy data will usually be far away from other points, and also allows you to naturally determine the number of clusters. The number of clusters corresponds to the number of edges in the minimum spanning tree after removing the longest edges. However, be aware that this approach may not always be the best for all types of data and requires careful analysis and customization of parameters for each specific case.
References
Поповський, В., Сабурова, С., Олійник, В., Лосєв, Ю., Агеєв, Д. Математичні основи теорій телекомунікаційних систем. Харків : ТОВ «Компанія СМІТ», 2006. 564 с.
Капітонова Ю.В. Основи дискретної математики. Київ : Наукова думка, 2002. 580 с.
Димова Г.О., Ларченко О.В. Моделі і методи інтелектуального аналізу даних: навчальний посібник. Херсон : Книжкове видавництво ФОП Вишемирський В. С., 2021. 142 с.
Phillips D.T., Garcia-Diaz A. Fundamentals of Network Analysis. Prentice-Hall, Inc., Englewood Cliffs, N.J., 1981. 496 p.
Dymova H. Development Of A Software Application Algorithm For Solving Computer Network Optimization Problems. Débats scientifiques et orientations prospectives du développement scientifique : c avec des matériaux de la VI conférence scientifique et pratique internationale, Paris, 1er Mars 2024. Paris-Vinnytsia : La Fedeltà & UKRLOGOS Group LLC, 2024. Pp. 226–229. DOI 10.36074/logos-01.03.2024.051.
Димова Г., Ларченко О. Розробка комп’ютерної програми розв’язання задач мережевої оптимізації. Науковий журнал «Комп’ютерно-інтегровані технології: освіта, наука, виробництво», (41), 2020. С. 142–150. DOI 10.36910/6775-2524-0560-2020-41-23.
Bhargava A. Grokking Algorithms. An Illustrated Guide For Programmers And Other Curious People. By Manning Publications Co. All Rights Reserved, 2016. 288 p.
Dymova H. Application of Characterization Analysis Methods to Investigation of Logical Networks Structures. Theoretical and Empirical Scientific Research: Concept and Trends with Proceedings of the V International Scientific and Practical Conference. Oxford, United Kingdom : European Scientific Platform, 2023. Pp. 124–128. DOI: 10.36074/logos-23.06.2023.34.