What is Robots.txt file? And how to use the Robots.txt file?

2477

How are you? I hope all of you are fine. I’m also fine for your blessings. Today I will share with you “What is a Robots.txt file?”. There are many bloggers, those who don’t understand Robots.txt option or never think with this option. They think that this option is not useful so they keep it blank. To be honest, this option plays a very important role in search engine optimization (SEO).

Do You Know What Is A Robots.txt File?

robot.txt

If you continue blogging without search engine optimization, then the possibility decreases of getting enough visitors on your blog. But going to activate this option, if you activate others Robots.txt file copy without understanding then it can be opposite. For this before activating it, you have to know details about it.

What is Robots.txt file?

Each search engine has their own web robot. You are thinking that this is like the Hindi movie of Rajnikanth. Actually, this is nothing like that. This is how much website is there in search engine, there is one type of web function for examining that, which is called as a robot. And through Robots.txt file that robot is instructed, does that crawl or index your blog or website? You can give permission to the robot for the crawl and index using Robots.txt file if you have wished, or you can’t give also. Or if you have wish then you can give permission to crawl and index some necessary post or you can’t give also.

Previous Article: What is Link Wheel? Let’s know how to use Link Wheel.

How Robots.txt file works?

Robots.txt file is like an announcer of flight of the airport. In which way he informs to passengers at the right time to rise in flight, same way Robots.txt file informs to index new posts on its blog when the time comes to crawl for search engine’s robot. As a result recently posted a new article comes to search engine easily.

User-agent: Media partners-Google
Disallow:
User-agent: *
Disallow: /search

Allow: /
Sitemap: http://www.prozokti.com/feeds/posts/default?orderby=UPDATED

Most of the blog’s Robots.txt files are of this type. You may have used in your blog or still now you are using by not getting to understand. I want to get understand clearly in this matter, then add this to the blog. I will try to make understand differently by dividing it into two parts. At first, I will discuss with these parts and then which hint mark is there in it with that I will also discuss.

  • User-agent: Media partners-Google:

At first, I say robots are instructed through a User-Agent. Here Media partners-Google is a robot of Google Adsense. If you use Google AdSense on your blog, then you have to add this. If you set, disallow this option, then the AdSense robot will not get any concept about your blog’s advertisement. If you don’t use Google AdSense, then delete these two red color line.

  • User-agent: “*”

All types of the robot are meant through this. When you will use “*” sign after User-agent then you will understand that time, you are instructing all types of robot.

  • Disallow: /search:

With this keyword is instructed to disallow. In another word, being told that not to crawl and index your blog’s search links. Like if you see blog’s Label link you will see that search word is there before each label link. For this robot is instructed for not crawling label links. Because of no need to index label links in the search engine.

  • Allow: “/”

The keyword gives instruction to allow through this. This “/” sign means robot will crawl and index your blog’s home page. Like you will see after submitting Google webmaster tools that always Google Webmaster Tools have indexed one more post than your post. Actually, not more, it has calculated your home page also.

  • Sitemap:

When you will post new then it will tell robot for indexing new posts. Each default blogger has one sitemap. But default it doesn’t index post more than 25. For this, this sitemap link has to submit in Google Webmaster Tools along with adding in the Robots.txt file.

Robots.txt” file is such a file which says to the search engine, the search engine will crawl any page of a site and will not crawl any page. This robots.txt file is in the root folder.

You may want that some pages of your site don’t show in the search result. Because it may be that still now that page’s work is not completed or other any reason. For this making a robots.txt file, there you will correct any pages of the search engine will not crawl. If you have sub-domain and if it’s some pages don’t show in the search result and you want that, then for this you have to make a robots.txt file differently. After making robots.txt file, you have to upload in the root folder.

Making robots.txt file:

With this robots.txt file, these controls which pages of search engines bought, a crawler and spider, the site will be seen or not seen. This control’s process is called the Robots Exclusion Protocol or Robots Exclusion Standard. Here, let us know about some used signs before making this file.

Disallow field can present partial or full URL. After “/” sign which path will be mentioned robot will not visit that path. Like:

Disallow: “help”

#disallows both /help.html and /help/index.html, whereas

Disallow: “/help/”

# would disallow /help/index.html but allow /help.html

Some examples:

All robots will approve for visiting all files (wildcard “*” instructs all robots)

User-agent: “*”
Disallow:

All robot will not visit any file.

User-agent: “*”
Disallow: “/”

Googlebot has only approval for visiting, left anyone will not visit.

User-agent: GoogleBot
Disallow:

User-agent: “*”
Disallow: “/”

Googlebot and Yahoo slurp have only approval of visiting, left anyone don’t have this.

User-agent: “GoogleBot”
User-agent: “Slurp”
Disallow:

User-agent: “*”
Disallow: “/”

If you want to close visit of a particular “bot” then

User-agent: “*”
Disallow:

User-agent: “Teoma”
Disallow: “/”

With this file, if you close crawling of any URL or page of your site, then for some problem these pages can be shown somewhere. Like URLs can be shown in referral log. In spite of that, there are some search engines whose algorithm are not developed, as a result, when spider/boat sends for crawling from these engines then it avoids the instructions of robots.txt files and your all URL will be crawled.

Avoiding all these problems, another process is the closing password of this all contents with the htacess file.

You can inform Google or search engine by setting “nofollow” in any link’s rel attribute that, it doesn’t crawl all these links. If your site is any blog or forum where something can be commented then you can do a noffolow comment part in this way. In it using the fame of your blog or forum, it will not increase their own site’s rank. Many times, many people can give the address of the repugnant site on your site, which you don’t want. In spite of this, they can give such site’s link which is spammer to Google, in it the fame of your site will be destroyed.

<a href=”http://www.yourdomain.com” rel=”nofollow”>Comment spammer</a>

Not giving nofollow in each link, it will work same if you give nofollow in the robot Meta tag.

<html>
<head>
<title>Brandon’s Baseball Cards – Buy Cards, Baseball News, Card Prices</title>
<meta content=”Brandon’s Baseball Cards provides a large selection of vintage and modern baseball cards for sale. We also offer daily baseball news and events in”>
<meta content=”nofollow”>
</head>
<body>

Thank you for reading this article. I hope this tune will work for you. If there is any mistake, then forgive me. If you face any problem, then don’t forget to comment. If you think the article is beneficial then obviously share it.

Thank You…