ONLINE LEARNING WITH SLIDING WINDOWS FOR TEXT CLASSIFIER ENSEMBLES
DOI:
https://doi.org/10.32782/tnv-tech.2024.6.6Keywords:
disinformation, fake news, online learning, ensembles of classifiersAbstract
In today's digital world, where information spreads at an incredible speed, detecting fake news and disinformation is becoming a critically important task. In the context of the Ukrainianlanguage information space, this task becomes even more relevant due to the hybrid war with Russia. Accordingly, in our study, the "Online Learning with Sliding Windows for Text Classifier Ensembles" (OLTW-TEC) method was developed and implemented, aimed at effectively detecting disinformation in Ukrainian-language text data. The goal is to increase the accuracy and adaptability in identifying fake news, particularly in the Ukrainian-language information space. This work focuses on the need to provide a fast and adaptable system in response to rapid changes in the information flow. The OLTW-TEC method uses advanced machine learning and data analysis techniques to create an adaptive classification system that can dynamically respond to changes in the information flow. The central element of the method is the integration of an ensemble of classifiers with the sliding window method, which makes it possible to constantly update the model based on the latest data, ensuring high accuracy and adaptability to new forms of disinformation. The method includes the stages of data collection and pre-processing, analysis of tonality, emotions and text vectorization, which allows for deeper analysis and more effective detection of fake news, relying on the unique linguistic and cultural features of the Ukrainian language. To analyze the effectiveness of OLTW-TEC, a unique dataset of Ukrainianlanguage news was used, which includes both reliable and false news. The results of the study demonstrated the high effectiveness of the method for identifying disinformation, with a classification accuracy of 93.26%. Analysis of the error matrix and other metrics, such as the F1 score, highlighted the balance and reliability of OLTW-TEC in detecting fake news. Compared to traditional classification methods, OLTW-TEC not only shows better results on most metrics, but also provides room for adaptation to changes in the nature of the data. The choice of the "sliding window" size and the ability to adjust it depending on the specifics of the data gives the method additional flexibility and accuracy.
References
Tao, W. & Peng, Y. Differentiation and unity: A cross-platform comparison analysis of online posts’ semantics of the Russian–Ukrainian war based on Weibo and Twitter. Communication and the Public, 2023. 8(2), 105-124. DOI: https://doi.org/10.1177/20570473231165563.
Mainych, S., Bulhakova, A., & Vysotska, V. Cluster analysis of discussions change dynamics on twitter about war in Ukraine. Proceedings of the 7th International Conference on Computational Linguistics and Intelligent Systems. Volume II: Computational Linguistics Workshop Kharkiv, Ukraine, 2023. 3396, 490–530. URL: https://ceur-ws.org/Vol-3396/paper39.pdf.
Vasist, P. N., & Krishnan, S. Fake news and sustainability-focused innovations: A review of the literature and an agenda for future research. Journal of Cleaner Production, 2023. 388, 135933. DOI: https://doi.org/10.1016/j.jclepro.2023.135933.
Hamed, S. K., Ab Aziz, M. J., & Yaakub, M. R. A review of fake news detection approaches: A critical analysis of relevant studies and highlighting key challenges associated with the dataset, feature representation, and data fusion. Heliyon, e20382. 2023. DOI: https://doi.org/10.1016/j.heliyon.2023.e20382.
Kondamudi, M. R., Sahoo, S. R., Chouhan, L., & Yadav, N. A comprehensive survey of fake news in social networks: Attributes, features, and detection approaches. Journal of King Saud University – Computer and Information Sciences, 2023. 35(6), 101571. DOI: https://doi.org/10.1016/j.jksuci.2023.101571.
Hu, L., Wei, S., Zhao, Z., & Wu. B. Deep learning for fake news detection: A comprehensive survey. AI Open, 2022. 3, 133-155. DOI: https://doi.org/10.1016/j.aiopen.2022.09.001.
Phan, H. T., Nguyen, N. T., & Hwang, D. Fake news detection: A survey of graph neural network methods. Applied Soft Computing, 2023. 139, 110235. DOI: https://doi.org/10.1016/j.asoc.2023.110235.
Das, B., & Sudarshan, T. S. B. Multi-contextual learning in disinformation research: A review of challenges, approaches, and opportunities. Online Social Networks and Media, 2023. 34-35, 100247. DOI: https://doi.org/10.1016/j.osnem.2023.100247.
Ruffo, G., Semeraro, A., Giachanou, A., & Rosso, P. Studying fake news spreading, polarisation dynamics, and manipulation by bots: A tale of networks and language. Computer Science Review, 2023. 47, 100531. DOI: https://doi.org/10.1016/j.cosrev.2022.100531.
Baker, M., Jihad, K., & Taher, Y. Prediction of people sentiments on Twitter using machine learning classifiers during Russian aggression in Ukraine. Jordanian Journal of Computers and Information Technology, 2023. 9(3), 189-206. DOI: https://doi.org/10.5455/jjcit.71-1676205770.
Peng, L., Jian, S., Kan, Z., Qiao, L., & Li, D. Not all fake news is semantically similar: Contextual semantic representation learning for multimodal fake news detection. Information Processing & Management, 2023. 61(1), 103564. DOI: https://doi.org/10.1016/j.ipm.2023.103564.
Qu, Z., Meng, Y., Muhammad, G., & Tiwari, P. QMFND: A quantum multimodal fusion-based fake news detection model for social media. Information Fusion, 2023. 104, 102172. DOI: https://doi.org/10.1016/j.inffus.2023.102172.
Soga, K., Yoshida, S., & Muneyasu, M. Exploiting stance similarity and graph neural networks for fake news detection. Pattern Recognition Letters, 2024. 177, 26-32. DOI: https://doi.org/10.1016/j.patrec.2023.11.019.
Yang, H., Zhang, J., Zhang, L., Cheng, X., & Hu, Z. MRAN: Multimodal relationship-aware attention network for fake news detection. Computer Standards & Interfaces, 2024. 89, 103822. DOI: https://doi.org/10.1016/j.csi.2023.103822.
Syed, L., Alsaeedi, A., Alhuri, L. A., & Aljohani H. R. Hybrid weakly supervised learning with deep learning technique for detection of fake news from cyber propaganda. Array, 2023. 19, 100309. DOI: https://doi.org/10.1016/j.array.2023.100309.16. Xie, B., & Li, Q. Detecting fake news by RNN-based gatekeeping behavior model on social networks. Expert Systems with Applications, 2023. 231, 120716. DOI: https://doi.org/10.1016/j.eswa.2023.120716.
Přibáň, P., Hercig, T., & Steinberger, J. Machine learning approach to factchecking in west slavic languages. Proceedings of the Recent Advances in Natural Language Processing, (973-979). Incoma Ltd., Shoumen, Bulgaria. 2019. DOI: https://doi.org/10.26615/978-954-452-056-4_113.
Bucos, M., & Drăgulescu, B. Enhancing fake news detection in Romanian using transformer-based back translation augmentation. Applied Sciences, 2023. 13(24), 13207. DOI: https://doi.org/10.3390/app132413207.
Afanasieva, I., Golian, N., Golian, V., Khovrat, A., & Onyshchenko, K. Application of neural networks to identify of fake news. Proceedings of the 7th International Conference on Computational Linguistics and Intelligent Systems, Volume II: Computational Linguistics Workshop, 2023. vol-3396, (346–358). Kharkiv, Ukraine. URL: https://ceur-ws.org/Vol-3396/paper28.pdf.
Bodyanskiy, Y. V., Lipianina-Honcharenko, K. V., & Sachenko, A. O. Ensemble of adaptive predictors for multivariate nonstationary sequences and its online learning. Radio Electronics, Computer Science, Control, 2022. 4(67), 91–97. DOI: https://doi.org/10.15588/1607-3274-2023-4-9.
Gramyak, R., Lipyanina-Goncharenko, H., Sachenko, A., Lendyuk, T., & Zahorodnia, D. Intelligent method of a competitive product choosing based on the emotional feedbacks coloring. Proceedings of the 2nd International Workshop on Intelligent Information Technologies & Systems of Information Security with CEUR-WS, 2021. (246-257). Khmelnytskyi, Ukraine. URL: https://ceur-ws.org/Vol-2853/paper31.pdf.
Lipianina-Honcharenko, K., Savchyshyn, R., Sachenko, A., Chaban, A., Kit, I., & Lendiuk, T. Concept of the intelligent guide with AR support. International Journal of Computing, 2022. 21(2), 271–277. DOI: https://doi.org/10.47839/ijc.21.2.2596.
Lipianina-Honcharenko, K., Wolff, C., Sachenko, A., Desyatnyuk, O., Sachenko, S., & Kit, I. Intelligent information system for product promotion in internet market. Applied Sciences, 2023. 13(17), 9585. DOI: https://doi.org/10.3390/app13179585.
Lipyanina, H., Sachenko, O., Lendyuk, T., Sachenko, A., & Vasylkiv, N. Intelligent method of forming the HR management short-term project. in: Shakhovska, N., Medykovskyy, M.O. (eds) Advances in Intelligent Systems and Computing V. CSIT 2020, Advances in Intelligent Systems and Computing, 2021. vol. 1293, (1045–1055), Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-63270-0_71.
Golovko, V., Kroshchanka, A., Komar, M., & Sachenko, A. Neural network approach for semantic coding of words. Advances in Intelligent Systems and Computing, 2020. 1020, 647–658. DOI: https://doi.org/10.1007/978-3-030-26474-1_45.
Lipianina-Honcharenko, K., Lendiuk, T., Sachenko, A., Osolinskyi, O., Zahorodnia, D., & Komar, M. An intelligent method for forming the advertising content of higher education institutions based on semantic analysis, in: Ignatenko, O., et al. ICTERI 2021 Workshops. ICTERI 2021, Communications in Computer and Information Science, 2022. vol. 1635, (169–182), Springer, Cham. DOI: https://doi.org/10.1007/978-3-031-14841-5_11.
Ukrainian news. Kaggle: Your Machine Learning and Data Science Community. URL: https://www.kaggle.com/datasets/zepopo/ukrainian-fake-and-truenews?resource=download