UDC 004.891.2
DOI: 10.36871/2618-9976.2024.03.006

Authors

Dmitry G. Rodionov,
Doctor of Science (Economic), Professor, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
Evgeny A. Konnikov,
Candidate of Sciences (Economic), Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
Polina A. Pashinina,
Specialist, St. Petersburg State Economic University, St. Petersburg, Russia
Sergey I. Shanygin,
Doctor of Science (Economic), Professor, St. Petersburg State University, St. Petersburg, Russia

Abstract

The paper investigates the impact of text processing, topic modeling and machine learning on predicting the dynamics of company stock prices in the information environment. It is found that the use of methods such as TF-IDF and LDA can identify key topics that influence the public dialog about companies in the media and entertainment industry. This allows us to assess the current state of news coverage and predict the impact of thematic aspects on company characteristics. Additionally, the use of gradient-based boosting based on the CatBoost algorithm and its optimization with GridSearchCV shows the potential to effectively predict whether news articles belong to specific clusters. This is an important step in analyzing the impact of the information environment on companies' performance and their performance in the media environment.

Keywords

Topic modeling, LDA, Machine learning, Gradient boosting, Catboost, Media companies