Fetch and parse any site's robots.txt. See crawler directives, sitemaps, allowed paths.
The robots.txt file lives at the root of your domain (e.g. https://yourdomain.co.ke/robots.txt) and tells web crawlers what they can and can't fetch. It's the first file Google, Bing and other search engines look at when they visit your site.
User-agent: — which crawler this block applies to. * means all crawlers. Googlebot, Bingbot, etc. target specific bots.Allow: — explicitly permit a path.Disallow: — block a path from being crawled.Sitemap: — tell crawlers where to find your XML sitemap (one or more URLs).Crawl-delay: — request a delay between requests (Google ignores this; Bing/Yandex honour it).