Tuesday, 13 December 2016

All about robots.txt of any web site.

Robots.txt is very good things for any web sites. The robots.txt is use for web routing purpose. in the robots.txt file , there are instructions are give for web routes for their wer robots. This is called The Robots Exclusion Protocol.
If you want to go www.abc.com/index.html , then it first checks for http://www.abc.com/robots.txt. if /index.html is allowed in robots.txt then you will be routed for that /index.html otherwise you can not go on index.html page. 

User-agent: *
Disallow: /search
Allow: /search/about
Disallow: /sdch
Disallow: /groups
Disallow: /index.html?
Disallow: /?
Allow: /?hl=
Disallow: /?hl=*&
Allow: /?hl=*&gws_rd=ssl$
Disallow: /?hl=*&*&gws_rd=ssl
Allow: /?gws_rd=ssl$
Allow: /?pt1=true$
Disallow: /imgres
Disallow: /u/
Disallow: /preferences
Disallow: /setprefs
Disallow: /default
Disallow: /m?
Disallow: /m/
Allow:    /m/finance
Disallow: /wml?
Disallow: /wml/?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /local?
Disallow: /local_url
Disallow: /shihui?
Disallow: /shihui/

.






Create a /robots.txt file on your web site

Where to put it

At the top-level directory of your web server.

 For example, for "http://www.abc.com/home/index.html, it will remove the "/home/index.html", and replace it with "/robots.txt", and will end up with "http://www.abc.com/robots.txt".

What to put in it

The "/robots.txt" file is a text file, with one or more records. Usually contains a single record as:
User-agent: *
Disallow: /search
Allow: /search/about
Disallow: /sdch
Disallow: /groups
Disallow: /index.html?
Disallow: /?
Allow: /?hl=
Disallow: /?hl=*&
Allow: /?hl=*&gws_rd=ssl$
Disallow: /?hl=*&*&gws_rd=ssl
Allow: /?gws_rd=ssl$
Allow: /?pt1=true$