Skip to main content

Glossary

Web Crawling

Web crawling is the automated traversal of websites by following links to discover and index content. A crawler starts from a seed URL, parses all links on that page, follows them, and repeats — building a map of an entire site or web graph. Search engine bots like Googlebot are web crawlers.

Crawling is distinct from scraping: a crawler discovers URLs, a scraper extracts data from them. For competitive intelligence or market research, crawling is typically the first phase — discovering all relevant URLs — before targeted scraping extracts specific data from each. Robots.txt files signal which paths crawlers should skip.