Some IIS Rewrite and htaccess Additions to Block Bad Bots

Discussion in 'Security and Legal' started by AWS, Jan 1, 2014.

  1. AWS

    AWS Administrator

    Joined:
    Feb 1, 2010
    Messages:
    1,616
    Likes Received:
    692
    Location:
    Joliet, IL U.S.A.
    First Name:
    Bob
    I found this somewhere online a few years ago and through the years I've updated it to catch new spam bots as I see them. It catches about 90% of the baddies.

    Just add the following to your .htaccess file.
    Code:
    # IF THE USER AGENT STARTS WITH THESE
    RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|baiduspider|copier|copyrightcheck) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|yandex|zeus|zyborg|anonymouse) [NC,OR]
    # STARTS WITH WEB
    RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]
    # ANYWHERE IN USER AGENT -- REGEX
    RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|baiduspider|baidu|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]
    # ISSUE 403 / SERVE ERRORDOCUMENT
    RewriteRule . - [F,L]
    
    If you use IIS and the built in rewrite module add this to the web.config file in the rewrite section.

    Code:
        
    <rule name="Bad Bot" stopProcessing="true">
          <match url="." ignoreCase="false" />
          <conditions logicalGrouping="MatchAny">
            <!--# IF THE USER AGENT STARTS WITH THESE-->
            <add input="{HTTP_USER_AGENT}" pattern="^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(cheesebot|cherrypicker|chinaclaw|collector|baiduspider|copier|copyrightcheck)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker)" />
            <add input="{HTTP_USER_AGENT}" pattern="^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|yandex|zeus|zyborg|anonymouse)" />
            <!--# STARTS WITH WEB-->
            <add input="{HTTP_USER_AGENT}" pattern="^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack)" />
            <!--# ANYWHERE IN USER AGENT - - REGEX-->
            <add input="{HTTP_USER_AGENT}" pattern="^.*(craftbot|baiduspider|baidu|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$" />
          </conditions>
          <action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />
        </rule>
    
    That should do it.
     
  2. GasMan320

    GasMan320 Regular Member

    Joined:
    Aug 30, 2012
    Messages:
    88
    Likes Received:
    71
    Location:
    Northern California
    Adding this stuff to .htaccess is fine if you're on shared hosting but if you have a VPS or your own server, you really should add rules like this to your httpd vhost configuration files.

    Adding to htaccess can create a significant performance penalty. One way in which you will see degraded performance is that there will be extra filesystem accesses for every request, every single time. If you're in a directory 5 layers deep and each directory has an htaccess file in it, that means 10 extra filesystem calls for every request (one stat() call to determine if an htaccess file exists, and one open() to read it).

    If you have access to httpd.conf, there's no reason to use htaccess. You can get a significant performance boost disabling htaccess altogether with AllowOverride None. The added benefit is that your rewrite rules will also be simpler.
     
    WEfail likes this.
  3. AWS

    AWS Administrator

    Joined:
    Feb 1, 2010
    Messages:
    1,616
    Likes Received:
    692
    Location:
    Joliet, IL U.S.A.
    First Name:
    Bob
    Along with that you can put the rules in the main IIS web.config file and they will be available server wide for all sites on the server as long as everything is set to inherit.
     
  4. Code Monkey

    Code Monkey Regular Member

    Joined:
    Apr 15, 2013
    Messages:
    230
    Likes Received:
    170
    I just block China instead. Seems to quiet everything down.

    But yeah, if you can put that in http.conf it's much better.
     
  5. hasseb432

    hasseb432 Regular Member

    Joined:
    Nov 15, 2014
    Messages:
    1
    Likes Received:
    0
    I think that is done by design. I have seen the same thing and do agree it's hard to read. Only thing you can do is switch to something else. Supposedly what G is doing will make reCaptcha harder to crack for spammers.
     

Share This Page