Tuesday 13 December 2016

All about robots.txt of any web site.

Robots.txt is very good things for any web sites. The robots.txt is use for web routing purpose. in the robots.txt file , there are instructions are give for web routes for their wer robots. This is called The Robots Exclusion Protocol.
If you want to go www.abc.com/index.html , then it first checks for http://www.abc.com/robots.txt. if /index.html is allowed in robots.txt then you will be routed for that /index.html otherwise you can not go on index.html page. 

User-agent: *
Disallow: /search
Allow: /search/about
Disallow: /sdch
Disallow: /groups
Disallow: /index.html?
Disallow: /?
Allow: /?hl=
Disallow: /?hl=*&
Allow: /?hl=*&gws_rd=ssl$
Disallow: /?hl=*&*&gws_rd=ssl
Allow: /?gws_rd=ssl$
Allow: /?pt1=true$
Disallow: /imgres
Disallow: /u/
Disallow: /preferences
Disallow: /setprefs
Disallow: /default
Disallow: /m?
Disallow: /m/
Allow:    /m/finance
Disallow: /wml?
Disallow: /wml/?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /local?
Disallow: /local_url
Disallow: /shihui?
Disallow: /shihui/

.






Create a /robots.txt file on your web site

Where to put it

At the top-level directory of your web server.

 For example, for "http://www.abc.com/home/index.html, it will remove the "/home/index.html", and replace it with "/robots.txt", and will end up with "http://www.abc.com/robots.txt".

What to put in it

The "/robots.txt" file is a text file, with one or more records. Usually contains a single record as:
User-agent: *
Disallow: /search
Allow: /search/about
Disallow: /sdch
Disallow: /groups
Disallow: /index.html?
Disallow: /?
Allow: /?hl=
Disallow: /?hl=*&
Allow: /?hl=*&gws_rd=ssl$
Disallow: /?hl=*&*&gws_rd=ssl
Allow: /?gws_rd=ssl$
Allow: /?pt1=true$























No comments:

Post a Comment