STYLOMETRIC CLASSIFICATION OF AI-GENERATED TEXTS: COMPARATIVE EVALUATION OF MACHINE LEARNING MODELS
Keywords:
AI-generated text, stylometry, text classification, machine learning, large language modelsAbstract
The rapid proliferation of large language models (LLMs), such as ChatGPT and Deepseek, has made it increasingly difficult to distinguish between AI-generated and human-written text.This study evaluates the effectiveness of stylometric analysis as a transparent and interpretable method for detecting synthetic content. A balanced dataset of 30,000 short-form responses (10,000 per class: Human, ChatGPT, Deepseek) was constructed. While the Human and ChatGPT responses were sourced from an existing dataset, the Deepseek responses were generated using standardized prompts to ensure consistency. Each response was transformed into a vector of 12 anually engineered features capturing lexical richness, syntactic structure, and readability.The study involved five classifiers: Logistic Regression, Support Vector Machine, Random Forest, Gradient Boosting, and Decision Tree. Each was trained and evaluated on multiclass and binary classification tasks. Randomized hyperparameter tuning was applied to enhance performance.The tuned Random Forest achieved the highest results, with macro-averaged F1-scores of 0.84 (multiclass) and 0.86 (binary), and accuracy over 87 %. Gradient Boosting and SVM showed comparably strong performance, confirming the robustness of ensemble and margin- based methods in this context. Key features such as Simpson’s Index, type-token ratio, and sentence length proved most informative. The results confirm that stylometric features, despite their simplicity, can reliably distinguish between human and AI-generated text. The results indicate that this approach demonstrates clear potential and, when used in combination with other methods, can contribute effectively to the identification of AI-generated content. Additionally, generating datasets using open-source models with the Ollama framework enables affordable and scalable experimentation without relying on commercial APIs. This is particularly beneficial for early stage research and academic environments with limited resources.
References
S. Gehrmann, H. Strobelt, and A. Rush, “GLTR: Statistical Detection and Visualization of Generated Text” in Proc. ACL: System Demonstrations, Florence, Italy, 2019, pp. 111–116. doi: 10.18653/v1/P19-3019
E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, and C. Finn, “DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature” arXiv preprint arXiv:2301.11305, 2023.
A. Akram, “An empirical study of AI generated text detection tools” arXiv:2310.01423, 2023.
C. Opara, “StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis” arXiv:2405.10129, 2024.
L. Mindner, T. Schlippe, and K. Schaaff, “Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT” in Proc. 4th Int. Conf. Artificial Intelligence in Education Technology (AIET), Nov. 2023, pp. 152–170.
A. Uchendu, T. Le, K. Shu, and D. Lee, “Authorship Attribution for Neural Text Generation” in Proc. EMNLP, Nov. 2020, pp. 8384–8395. doi: 10.18653/v1/2020. emnlp-main.673.
G. Huang, Y. Zhang, Z. Li, Y. You, M. Wang, and Z. Yang, “Are AI-Generated Text Detectors Robust to Adversarial Perturbations?” in Proc. 62nd Annu. Meet. Assoc. Comput. Linguistics (ACL), Aug. 2024, pp. 6005–6024
A. M. Sarvazyan et al., “Overview of AuTexTification at IberLEF 2023: Detection and attribution of machine-generated text in multiple domains” Proces. Leng. Nat., vol. 71, pp. 275–288, 2023.
G. Mikros, A. Koursaris, D. Bilianos, and G. Markopoulos, “AI-writing detection using an ensemble of transformers and stylometric features” in CEUR Workshop Proc., vol. 3496, pp. 142–153, 2023.
J. Zhang, H. Sun, K. Duan, X. Li, M. Zhang, Y. Liu, and M. Sun, “How Would GPT Behave? Towards Detecting AI-Generated Text via Phrase-Level Self-Diversity,” arXiv preprint arXiv:2301.07597, 2023.