During the latest webmaster hangout, Google employee John Mueller answered a question about how the search engine feels about “complex” and “huge” robots.txt files.
It was a file with 1500 lines and many disallow directives, the number of which increased over the years.
According to Mueller, a large robots.txt file does not directly negatively impact a site’s search engine optimization. However, it is difficult to maintain, which can lead to unexpected problems.
To an additional question about whether problems are possible if you do not include the Sitemap in robots.txt, Mueller answered this way:
“No, for us, these different ways of submitting a Sitemap are equivalent.”
Mueller then answered a few more questions on the subject.
“If you drastically reduce the robots.txt file, how does this affect SEO? For example, remove all disallow directives?”. In this case, the disallow directive closes the footer and header elements of pages that are not of interest to users.
It’s hard to say for sure what will happen once these snippets are indexed, SearchEngines notes. In this case, the best solution may be to use the “trial and error” approach. For example, you can open one of these snippets for crawling and see what happens in search to see if it’s a problem.
Mueller noted that it is very easy to block something in robots.txt, but then it takes a lot of time to support large files, so it is important to understand whether these blocks are really needed.
As for the size, Google has no specific recommendations, NIXSolutions notes. Some sites have large files, while others have small ones. The main thing is that they work.
Mueller also recalled that Google has an open source robots.txt parser. Specialists can ask their developers to run this scraper for the site and check which URLs are actually blocked and what changes it makes. Thus, it is possible to test these URLs before removing the indexing ban.
This parser is available on Github.
You can listen to this part of the discussion on the video.