John Mueller of Google answers questions about using robots.txt to block special files such as .css and .htacess.
This topic was discussed in detail in the latest edition. Ask the Google Webmaster Video series on YouTube.
The questions submitted are:
“For robots.txt, should I use either ‘disallow: /*.css$’, ‘disallow: /php.ini’, or ‘disallow: /.htaccess’?”
In response, Mueller said Google can’t stop site owners from disallowing those files. Although it is certainly not recommended.
“No. You can’t ban those files. But that sounds like a bad idea. Some special cases are mentioned, so let’s see.”
In some cases, just blocking special files is redundant, but in other cases it can severely impact Googlebot’s ability to crawl your site.
This section describes what happens when each type of special file is blocked.
Related: How the Robots.txt file addresses security risks
Blocks of CSS
Crawling CSS is very important to help Googlebot render your page properly.
Site owners may feel the need to block CSS files to prevent the files themselves from being indexed, but Mueller says that usually isn’t the case.
Google needs the file anyway, so even if the CSS file ends up being indexed, it won’t hurt enough to block it.
This is Mueller’s response:
“‘*.css’ blocks all CSS files. CSS files must be accessible so that the page can render properly.
This is important, for example, to let people know that the page is mobile friendly.
CSS files themselves aren’t typically indexed, but you need to be able to crawl them. ”
PHP blocks
php.ini is not an easily accessible file, so there is no need to block php.ini with robots.txt.
This file should be locked down. This prevents even Googlebot from accessing it. And that’s perfectly fine.
As Mueller explains, PHP blocks are verbose.
“You also mentioned PHP.ini, which is PHP’s configuration file. Generally, this file should be locked down or placed in a special location so that no one can access it.
This includes Googlebot if no one has access. So, again, there is no need to prohibit that crawl. ”
htaccess block
Like PHP, .htaccess is a locked down file. This means that even Googlebot cannot access it from the outside.
There’s no need to disallow it because it can’t be crawled in the first place.
“Finally, you mentioned .htaccess. It’s a special control file that by default is inaccessible to the outside world. Like any other locked file, it’s inaccessible, so there’s no need to explicitly disallow crawling .”
Related: Best practices for setting meta robot tags and robots.txt
Müller’s recommendation
Mueller concluded the video with a few words on how site owners create robots.txt files.
Site owners tend to run into problems when copying another site’s robots.txt file and using it as their own.
Mueller advises against it. Instead, carefully consider which parts of your site you don’t want crawled, and disavow only those parts.
“My recommendation is to not reuse someone else’s robots.txt file and assume it will work. Instead, think about what parts of your site you don’t want crawled, and Please prohibit crawling of.”
Related article:
var s_trigger_pixel_load = false; function s_trigger_pixel(){ if( !s_trigger_pixel_load ){ striggerEvent( 'load2' ); console.log('s_trigger_pix'); } s_trigger_pixel_load = true; } window.addEventListener( 'cmpready', s_trigger_pixel, false);
window.addEventListener( 'load2', function() {
if( sopp != 'yes' && !ss_u ){
!function(f,b,e,v,n,t,s) {if(f.fbq)return;n=f.fbq=function(){n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)}; if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version='2.0'; n.queue=[];t=b.createElement(e);t.async=!0; t.src=v;s=b.getElementsByTagName(e)[0]; s.parentNode.insertBefore(t,s)}(window,document,'script', 'https://connect.facebook.net/en_US/fbevents.js');
if( typeof sopp !== "undefined" && sopp === 'yes' ){ fbq('dataProcessingOptions', ['LDU'], 1, 1000); }else{ fbq('dataProcessingOptions', []); }
fbq('init', '1321385257908563');
fbq('track', 'PageView');
fbq('trackSingle', '1321385257908563', 'ViewContent', { content_name: 'blocking-special-files-in-robots-txt', content_category: 'news seo' }); } });