UDC 004.493
DOI: 10.36871/ek.up.p.r.2024.12.03.012
Authors
Aminat A. Albakova,
Diana D. Maigova,
Grozny State Petroleum Technical University named
after Academician Millionshchikov, Grozny, Russia
Abstract
Every year, malware authors create more sophisticated and sophisticated malware that can
harm our computers. Traditional methods based on searching for program signatures are no longer effective
in solving the problem of malware detection. It is being replaced by automatic file analysis, which is a more
promising approach to detecting suspicious files. Machine learning methods are increasingly being used to
detect such malware. However, such solutions may require a lot of computing resources to perform their operations.
Therefore, the task arises of creating an optimal machine learning model in terms of learning speed
and malware detection accuracy. In addition, usually the data representation method alone is not enough to
detect malicious file properties. Thus, this article will describe two different methods: one method is based on
the binary information of the file, the other is based on the disassembled code of executable files.
The purpose
of this work is to increase the effectiveness of malware detection by optimizing feature extraction methods
and applying machine learning. The main tasks of the research include: extracting features from executable
files, creating several machine learning models and comparing them to determine the most effective one. The
dataset used in this study was collected from various online sources and consists of 12,824 data, executable
files in the format.exe files, of which 11,844 are malicious and 980 are safe.
Keywords
intrusion detection, PE format, feature extraction, disassembled instructions, reference vector machine.

