It is only an aid like the sitemaps and if indicat are interesting (they change the HTML content), it will use it anyway. Basically it is only useful to indicate that campaign URLs should not be index and to help a little with the listings. What this function will certainly not prevent is that robots enter said URL.
All directives not cover by the robots definition are ignor
For example, the famous “Crawl-delay” is ignor. In theory it should indicate the time between robot requests, but it is ignor, so we can forget about this sentence, at least for Google (other crawlers do pay attention to it).
Any directives invent by third parties are also ignor
And finally, lines starting with “#” are also ignor as they are understood as comments. However, they do count towards the maximum file size, so it is best moj database not to overuse them. A tip for comments: when working with multiple projects or many sites, it is best to include notes from the upload version as a comment.
What happens when Google can’t access or
Finds strange things when accessing your robots file?
We have already said that the robots.txt file is always search for in the “/robots.txt” path of your domain. And if it does not find it, it can go to a use xiaohongshu to get your brand higher domain level (if it exists). For example, if it does not find it in .domain.com/robots.txt it will go to domain.com/robots.txt
But let’s see now what happens when you request it
What a server will do when it receives the request for the robots.txt file is to return a response code, telling the spiders whether it is finding it or not.
Code “200” means that the file does exist. It will then read it and apply its rules. If it is empty or Googlebot has no “disallow” guidelines, it will understand usa data that it has access to everything and will go all the way to the kitchen of the web.