UDC 004.051 ...
DOI: 10.36871/2618-9976.2021.02.006

Authors

Vildanov Timur Emilievich
Master's degree, Financial University under the Government of the Russian Federation, Moscow, Russia
Ivanov Nikita Sergeevich
Master's degree, Financial University under the Government of the Russian Federation, Moscow, Russia

Abstract

This article explores both popular and newly invented tools for extracting data from sites and converting them into a form suitable for analysis. The paper compares the Python libraries, the key criterion of the compared tools is their performance. The results will be grouped by sites, tools used and number of iterations, and then presented in graphical form. The scientific novelty of the research lies in the field of application of data extraction tools: we will receive and transform semistructured data from the websites of bookmakers and betting exchanges. The article also describes new tools that are currently not in great demand in the field of parsing and web scraping. As a result of the study, quantitative metrics were obtained for all the tools used and the libraries that were most suitable for the rapid extraction and processing of information in large quantities were selected.

Keywords

Parsing
Web scraping
HTML
PYTHON
Arbitration