Autoencoder-Based Anomaly Detection and Analysis in Log Data Generated in Cloud Systems Using Natural Language Processing
Loading...

Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
In this study, an Autoencoder-based model was developed to detect anomalies in log data obtained from cloud systems. The dataset used consists of log records from the Blue Gene/L (BGL) supercomputer. In the preprocessing phase, log messages were vectorized using the TF-IDF method, and structural features such as content length, word count, and the presence of component/type information were extracted to create an enriched feature matrix. The model attempted to reconstruct each log entry and calculated the reconstruction errors. Records were then classified as normal or anomalous based on a threshold corresponding to the 95th percentile of these errors. The developed model achieved a high performance with an accuracy rate of 99.61%, as well as strong results in precision, recall, and F1-score metrics. Additional evaluations using ROCAUC and Precision-Recall curves further confirmed the model's robustness. The results demonstrate that the Autoencoder architecture can effectively detect anomalies in large and complex log datasets. Within the scope of the study, the proposed model was also evaluated comparatively against recent approaches such as DeepLog, LogRobust, MLP, and LogEvent2Vec. The proposed model outperformed all other methods across all performance metrics. These findings highlight the Autoencoder-based method as a strong alternative in terms of both computational efficiency and anomaly detection capability. © 2025 IEEE.
Description
Keywords
Anomaly Detection, Autoencoder, Cloud Logs, Natural Language Processing, Tf-Idf, Unstructured Data Analysis
Fields of Science
Citation
WoS Q
N/A
Scopus Q
N/A
Source
-- 9th International Symposium on Innovative Approaches in Smart Technologies, ISAS 2025 -- 2025-06-27 through 2025-06-28 -- Gaziantep -- 211342
