Crawl, or Web Crawling, with its general name, is the process of obtaining links by browsing websites. Crawl literally means "crawling". It means performing many operations in order to realize the targeted operation.We should not forget that we create resources from different sites when the crawling process is performed. Since different sources are used during the crawling process, it is necessary to be very careful.
The crawling process should be operated according to a plan, the system used while performing this process should not crash. For this reason, the parameters should be regulated carefully and the resources used should be ensured not to collapse. The concept of web crawler is a concept that emerged with the prevalence of search engines. The created search engines quickly scan and index the links on the internet sites on the internet.
The information contained in these indexed sites is tried to be conveyed to people in a short time. In short, web crawler can be expressed as an effort to reach the right information in a short time. The main reason why this process is called crawl is because it refers to reaching the target by crawling, just like a baby reaches its target. The web created for the purpose of recording and monitoring the links is similar to the web created by a spider.
How Web Crawler Works
The web crawler first downloads a website and expertly extracts the links on the downloaded website. After removing these links, it also extracts the keywords on the page. It transfers the page and keyword information it has prepared to the indexes. It then repeats this process on other sites, creating a resource.
The web crawler also provides various categories while performing these operations. These categories are made available with the help of scrapy. While the in-depth crawl operation registers in a different category, the frame support registers in a different category. Thanks to the web crawler, you can browse a website with an HTTP connection and get the links on this target site. It is up to you whether these links will be wide-ranging or collected on a single site.
The name of the system that allows you to perform the link collection process is web crawler, and your link collection process with this system is called web crawling.