What is Robots.txt and its purpose in SEO
Robots.txt file allows
the search engine bots to crawl the website and tell them about the crawlable
area of the website. That's why the name of the file is robots.txt file because
it manages all the search engine bots to crawl the website.
Purpose of robots.txt file
The two main purposes
of the Robots.txt file
1. It contains the name
of the search engine bots that crawl the website and indexes it on their web
search result.
(Note: We should allow all the search engine bots in our robots.txt file. For that, we use an asterisk (*) sign.)
2. It defines which
sections of the website are crawlable by the search engine bots and which sections are
to be restricted.
(Note: It is very
important to prevent some parts of your website to crawl by the search engine
bots)
Which Sections should be allowed to index and which should not?
Everything which is important such as pages, landing pages, blog posts, custom post types like portfolio,
courses, gallery, work, etc (according to the website) should be allowed to
index. Because these pages have all the content or information which you want
to reach the public.
All the unimportant
things like tags, categories, taxonomies, labels, thank you pages, backend
administration pages, etc should not be allowed to index. As these things are
to sort the content in your website or blog but it is useless for the
end-users. You can also disallow the posts or pages which you want to make
online but private and available to only those who have the link for the same.
Syntax and Structure of Robots.txt file
User-agent: *
Disallow: /
Allow: /
The
asterisk (*) after “user-agent” means that the robots.txt file applies to all
search engine bots that visit the site.
All
the URLs after “Disallow” tell the bots not to visit those pages or links on
the site.
Similarly,
all the URLs after the "Allow" tell the bots to visit those pages or
links on the site.
(Note:
you can also add your website's sitemap here. your sitemap includes all the
important pages and posts)
Nofollow and Noindex Directive for Robots.txt file
Another
2 important aspects of the Robots.txt file are Nofollow and Noindex.
1. Noindex
Disallow
only prevents the search engine bots to crawl particular web pages. However, it
doesn’t actually prevent the pages from being indexed.
Therefore
we use noindex directive. It tells the search engine bots not to index specific
web pages.
You
can also use noindex in the robots.txt file. The format is as follow:
User-agent: *
Disallow: /
Allow: /
Noindex: /
2. Nofollow:
Nofollow works the same as the nofollow link. It tells the search engine bots not to crawl the links on a specific web page.
Nofollow
directive of the Robots.txt file implements in the meta tag in the head section
of your webpage. The syntax and structure are as follow:
<meta
name="robots" content="noindex,nofollow" />
Place
this meta tag in between the <head> tags.
I hope this article will help you to understand the robots.txt file and its purpose in SEO. Now, you can easily create and edit robots.txt for your website according to your site structure.
If you have any
suggestions or need any help regarding the same, you can let us know by comment
in the comment section below.
If you like this article, then please share it with your friends and follow us on social media. You can also subscribe to our newsletter to get all the latest updates directly in your inbox.
Post a Comment