Open Access Paper
28 December 2022 Optimization and application of web crawler architecture
Kuncheng Li, Junqi Fei, Chunmei Fan
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125060N (2022) https://doi.org/10.1117/12.2661783
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
For monitoring of cutting-edge technologies by obtaining the massive data on the internet, the pyspider framework is used to regularly crawl the information of a large number of websites, and the crawled data is regularly imported into the business application system through python scripts. At the initial stage of the project, the distributed architecture is used as officially recommended by pyspider. Later, it is optimized as the clustered architecture in order to adapt to the actual network environment and the characteristics of crawler tasks. After testing, the work efficiency and stability of the improved architecture have been greatly improved and the expected results have been achieved.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kuncheng Li, Junqi Fei, and Chunmei Fan "Optimization and application of web crawler architecture", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125060N (28 December 2022); https://doi.org/10.1117/12.2661783
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Data communications

Data storage

Internet

Network architectures

Telecommunications

Data centers

Back to Top