Robots.txt Disallow /posts/ ???

Discussion in 'SEO, Traffic and Revenue' started by Sylvain, Oct 11, 2013.

  1. Sylvain

    Sylvain Regular Member

    Joined:
    Mar 15, 2013
    Messages:
    140
    Likes Received:
    17
    I use XenForo 1.2.2 and I've noticed that many Xenforo forums disallow the /posts/ in their robots.txt files. Why not let robots crawl the /posts ?

    User-agent: *
    Disallow: /posts/
     
  2. Programmers World

    Programmers World Regular Member

    Joined:
    Aug 9, 2013
    Messages:
    46
    Likes Received:
    3
    That's strange. I would think that it would be extremely useful to let robots crawl your posts.
     
  3. Sylvain

    Sylvain Regular Member

    Joined:
    Mar 15, 2013
    Messages:
    140
    Likes Received:
    17
    There must be a good reason.

    I also found that many forums disallow all pages in their robots.txt file
     
  4. Cerberus

    Cerberus Admin Talk Staff

    Joined:
    May 3, 2009
    Messages:
    1,031
    Likes Received:
    500
    C&P of here... Which is a pretty good one

    Code:
    User-agent: *
    Disallow: /misc/
    Disallow: /help/
    Disallow: /search/
    Disallow: /register/
    Disallow: /login/
    Disallow: /online/
    Disallow: /lost-password/
    Disallow: /account/
    Disallow: /admin.php
    Disallow: /events/birthdays/
    Disallow: /events/monthly
    Disallow: /events/weekly
    Disallow: /goto/
    Disallow: /help/
    Disallow: /login/
    Disallow: /media/keyword/
    Disallow: /media/user/
    Disallow: /media/service/
    Disallow: /media/submit/
    Disallow: /misc/style?*
    Disallow: /misc/quick-navigation-menu?*
    Disallow: /online/
    Disallow: /forums/7/
    Disallow: /forums/20/
    Disallow: /forums/70/
    Disallow: /forums/49/
    Disallow: /forums/155/
    Disallow: /forums/156/
    Disallow: /forums/184/
    Disallow: /forums/200/
    Disallow: /forums/188/
    Disallow: /forums/186/
    Disallow: /forums/187/
    Disallow: /forums/189/
    Disallow: /forums/191/
    
    Allow: /
    
    Sitemap: http://admin-talk.com/sitemap/sitemap.xml.gz
    Sitemap: http://dir.admin-talk.com/sitemap.xml
     
  5. Sylvain

    Sylvain Regular Member

    Joined:
    Mar 15, 2013
    Messages:
    140
    Likes Received:
    17
    Pretty much similar to my file

    User-agent: *
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /goto/
    Disallow: /login/
    Disallow: /admin.php
    Disallow: /search/
    Disallow: search.php
    Disallow: /help/
    Disallow: /members/
    Disallow: /misc/
    Disallow: /online/
    Allow: /
    User-agent: ia_archiver
    Allow: /
    User-agent: BoardTracker
    Disallow: /
    User-agent: BoardReader
    Disallow: /
    User-agent: Baiduspider
    User-agent: Baiduspider-video
    User-agent: Baiduspider-image
    Disallow: /
     
  6. MyDigitalpoint

    MyDigitalpoint Regular Member

    Joined:
    Jun 5, 2013
    Messages:
    114
    Likes Received:
    30
    Location:
    Virtual World
    Well, there is always the chance that one tweak robots.txt to suit particular needs, problem could be that at every forum settings update, the robots file can be modified on the fly.

    I remember to have a long, long disallow list including search engines, crawlers and bots, but sometimes I wonder if all of those abide by the rules and refrain from crawl what is on-site.

    I read some time ago that the best way to prevent bots from crawling directories or files that we don't want to get crawled is making no reference to them in robots text to avoid they know about their existence... unless they are linked to other with crawling allowance.
     
  7. BamaStangGuy

    BamaStangGuy Administrator

    Joined:
    Jun 23, 2009
    Messages:
    769
    Likes Received:
    549
    Location:
    Huntsville, AL
  8. GTB

    GTB Regular Member

    Joined:
    Jun 30, 2009
    Messages:
    1,792
    Likes Received:
    270
    Well the idea of a robots.txt file is to stop certain bots that obey it, to not waste time looking at files that don't need indexing to speed things up for bots. I wouldn't go daft adding loads of entries, but some files and folders are obvious for blocking. Cache folders and files like install and config.php files e.t.c

    It can help a little blocking those obvious files and folders from being indexed. You can also use it to limit crawl numbers, which can save you on bandwidth.
     
    Last edited: Oct 28, 2013
  9. Sylvain

    Sylvain Regular Member

    Joined:
    Mar 15, 2013
    Messages:
    140
    Likes Received:
    17
  10. BamaStangGuy

    BamaStangGuy Administrator

    Joined:
    Jun 23, 2009
    Messages:
    769
    Likes Received:
    549
    Location:
    Huntsville, AL
    Yes. Though today I did spot a instance where /posts/ were being indexed. Will add it to that thread.
     
  11. deansaliba

    deansaliba Regular Member

    Joined:
    May 9, 2011
    Messages:
    51
    Likes Received:
    15
    Location:
    London, UK
    I have noticed that more and more web sites and blogs are using the robot.txt file recently. I n the last month I have lost count of how many times I have looked for something in Google and found a site listed with a description saying that robot.txt had blocked the search engine from crawling the site. I didn't think they would index those sites if they were restricted by robot.txt. :confused:
     
  12. Sylvain

    Sylvain Regular Member

    Joined:
    Mar 15, 2013
    Messages:
    140
    Likes Received:
    17
    Some robots bypass the robots.txt file
     
  13. pixelek

    pixelek Regular Member

    Joined:
    Oct 9, 2013
    Messages:
    229
    Likes Received:
    85
    Location:
    Torun, Poland
    its not very wise thing to do. Of course its none of my business but, I wouldnt do that. Some of your users may post sensitive data and - by means of not using robots.txt (=letting robots crawl your site) - these data get revealed. Its strange policy of yours.
     
  14. BamaStangGuy

    BamaStangGuy Administrator

    Joined:
    Jun 23, 2009
    Messages:
    769
    Likes Received:
    549
    Location:
    Huntsville, AL
    I have no idea how a robots.txt is going to protect my members from themselves?
     
  15. MyDigitalpoint

    MyDigitalpoint Regular Member

    Joined:
    Jun 5, 2013
    Messages:
    114
    Likes Received:
    30
    Location:
    Virtual World
  16. Sylvain

    Sylvain Regular Member

    Joined:
    Mar 15, 2013
    Messages:
    140
    Likes Received:
    17
    I checked my competitors.... some uses robots, some others don't
     
  17. cpvr

    cpvr Regular Member

    Joined:
    Aug 14, 2009
    Messages:
    3,219
    Likes Received:
    823
    I don't have any competitors and when I did most of them used robots.txt to block spiders from crawling their content.
     
  18. JoeyJ

    JoeyJ Regular Member

    Joined:
    Nov 29, 2013
    Messages:
    18
    Likes Received:
    10
    Disallowing directories also makes your site less vulnerable (to an extent). You'd be surprised how much informtion a robot can disclose about your website. It can index vulnerable files or sensitive directories, or even show parameters that could lead to MySQL injection.
     

Share This Page