In the end, with this situation, what we cause is to lose crawling time on URLs that will never be index, so it is usually preferable to block access to them directly. Let’s think about any URL, even the destination URLs of Google forms, and once we have them in mind:
Let’s prevent HTML from displaying links to them
Regardless of whether this access exists or not, let’s mark them with disallow in robots.
BONUS. It is possible to send a noindex from your server by creating a kind of robots.txt but for noindex and nofollow.
I don’t count this point among
The 10 concepts because it actually talks more about indexing directives than robots.txt and it is more of a possibility that is not easy to implement business owner database for everyone (and in its simplest version it is not recommend and we don’t really know if it works).
We are talking about finding a way not
To prohibit crawling, but to prohibit indexing: the equivalent of the robots meta tag mark as “noindex” that we discuss earlier. You can read everything about this topic, the most common is that about the “noindex:” directive within the if your business primarily serves robots.txt file.
This rule tells us that we can create
Noindex statements in the robots file with the same nomenclature. This would tell the robot that it can browse and crawl category-1 but that the contents of the paginations in this category should not appear in the search index.
It would be great if they let us do this
Since as I said before blocking a URL does not imply deindexing it, and so we would have total control over everything. However, and despite the fact that you can see how many SEOs mention it and even Deepcrawl measur it , Google did not say at the time that it does not recommend using it and as long as they keep saying that, I think it makes no sense usa data to do so. So we do not enjoy this possibility.