UDC 004.493
DOI: 10.36871/ek.up.p.r.2024.12.03.012

Authors

Aminat A. Albakova,
Diana D. Maigova,
Grozny State Petroleum Technical University named after Academician Millionshchikov, Grozny, Russia

Abstract

Every year, malware authors create more sophisticated and sophisticated malware that can harm our computers. Traditional methods based on searching for program signatures are no longer effective in solving the problem of malware detection. It is being replaced by automatic file analysis, which is a more promising approach to detecting suspicious files. Machine learning methods are increasingly being used to detect such malware. However, such solutions may require a lot of computing resources to perform their operations. Therefore, the task arises of creating an optimal machine learning model in terms of learning speed and malware detection accuracy. In addition, usually the data representation method alone is not enough to detect malicious file properties. Thus, this article will describe two different methods: one method is based on the binary information of the file, the other is based on the disassembled code of executable files.
The purpose of this work is to increase the effectiveness of malware detection by optimizing feature extraction methods and applying machine learning. The main tasks of the research include: extracting features from executable files, creating several machine learning models and comparing them to determine the most effective one. The dataset used in this study was collected from various online sources and consists of 12,824 data, executable files in the format.exe files, of which 11,844 are malicious and 980 are safe.

Keywords

intrusion detection, PE format, feature extraction, disassembled instructions, reference vector machine.