Got a spambot or scraper constantly showing up in your server logs? Or maybe there’s another site that’s leeching all your bandwidth? Perhaps you just want to ban a user from a certain IP address? In this article, I’ll show you how to use .htaccess to do all of that and more!

Identifying bad bots

So you’ve noticed a certain user-agent keeps showing up in your logs, but you’re not sure what it is, or if you want to ban it? There’s a few ways to find out:

Once you’ve determined that the bot is something you want to block, the next step is to add it to your .htaccess file.

Blocking bots with .htaccess

This example, and all of the following examples, can be placed at the bottom of your .htaccess file. If you don’t already have a file called .htaccess in your site’s root directory, you can create a new one.

#get rid of the bad bot
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BadBot
RewriteRule ^(.*)$ http://go.away/

So, what does this code do? It’s simple: the above lines tell your webserver to check for any bot whose user-agent string starts with "BadBot". When it sees a bot that matches, it redirects them to a non-existent site called "go.away".

Now, that’s great to start with, but what if you want to block more than one bot?

#get rid of bad bots
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EvilScraper [OR]
RewriteCond %{HTTP_USER_AGENT} ^FakeUser
RewriteRule ^(.*)$ http://go.away/

The code above shows the same thing as before, but this time I’m blocking 3 different bots. Note the "[OR]" option after the first two bot names: this lets the server know there’s more in the list.

Blocking Bandwidth Leeches

Say there’s a certain forum that’s always hotlinking your images, and it’s eating up all your bandwidth. You could replace the image with something really gross, but in some countries that might get you sued! The best way to deal with this problem is simply to block the site, like so:

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC]
RewriteRule .* - [F] 

This code will return a 403 Forbidden error to anyone trying to hotlink your images on somebadforum.com. The end result: users on that site will see a broken image, and your bandwidth is no longer being stolen.

Here’s the code for blocking more than one site:

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*example\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*lastexample\.com [NC]
RewriteRule .* - [F] 

If you want to block hotlinking completely, so that no one can hotlink your files, take a look at my article on using .htaccess to block hotlinkers.

Banning An IP Address

Sometimes you just don’t want a certain person (or bot) accessing your website at all. One simple way to block them is to ban their IP address:

order allow,deny
deny from 192.168.44.201
deny from 224.39.163.12
deny from 172.16.7.92
allow from all

The example above shows how to block 3 different IP addresses. Sometimes you might want to block a whole range of IP addresses:

order allow,deny
deny from 192.168.
deny from 10.0.0.
allow from all

The above code will block any IP address starting with "192.168." or "10.0.0." from accessing your site.

Finally, here’s the code to block any specific ISP from getting access:

order allow,deny
deny from some-evil-isp.com
deny from subdomain.another-evil-isp.com
allow from all

Final notes on using .htaccess

As you can see, .htaccess is a very powerful tool for controlling who can do what on your website. Because it’s so powerful, it’s also fairly easy for things to go wrong. If you have any mistakes or typos in your .htaccess file, the server will spit out an Error 500 page instead of showing your site, so be sure to back up your .htaccess file before making any changes.

If you’d like to learn more about writing .htaccess files, I recommend checking out the Definitive Guide to Mod_Rewrite. This book covers everything you need to know about Apache’s .htaccess rewrite system.

PS: If your webhost doesn’t support .htaccess, it’s time to get a better one! :)

Originally Posted by John on 2007-06-23 on http://blamcast.net/articles/block-bots-hotlinking-ban-ip-htaccess

There are many times when it would be useful to host a sub-domain of your main site on another server or hosting company altogether. Maybe your sub-domain has outgrown the main site and you want to host it elsewhere for a better deal, maybe you’ve got some ultra cheap static file hosting for serving up images and videos elsewhere or maybe you have a company that allows you to have an affiliate store or something like that and they allow you to use your own domain. …except that you want to create a sub-domain of your main site to ensure consistency, customer trust and to retain your branding.

Whatever your need, the solution is simple, but requires you to have access to your DNS records for your domain. These will usually be found at your hosting company. If you use cpanel/whm, then this is fine – if you have access to the actual files themselves then this is good too.

This is completely different to how you’d normally set up a “soft” sub-domain which is essentially a folder within your normal site. Before you start you’ll need to know the IP address of the server that will be hosting the sub-domain account. In the instance that you’re creating another account, say with cpanel on another host, then go ahead and set up that new account. Where it asks you to enter your domain name you should enter the sub-domain and set it up as normal. If you want to point your sub-domain at a site where someone else deals with the hosting, then you’ll have to ask them for the IP address.

Once in possession of this, you then need to edit your DNS record. If you’re using WHM/Cpanel then you’ll see the option called “Edit DNS Zone” in the left hand menu. Then select the domain name for the zone you wish to edit.

domain_management
Don’t be scared of the next screen, just scroll down to where it says “Add New Entries Below this Line” and in the first column type the name of your sub-domain (just the first part, not the fully qualified name. For example if you are creating dominos.mygames.com then enter dominos). Leave the numeric value in the next column, and make sure the drop-down menu is selected as “A”. In the next column you then enter the IP address of the new server that will host the sub-domain. There may be extra blank boxes after this, but these can be ignored. Then click “save”.

a_cname_records

If you’re editing your own files then you’ll need to locate your zone file, and in the section where the other sub-domains are set up (localhost, mail, www, ftp) add the line to set up your sub-domain. Zone files are often located in /var/named or /var/named/chroot/var/named.

There you are, if you previously set up a cpanel account for this at another host then it should soon be visible.


dns_records


Introduction to DNS

The Domain Name System, or DNS, is the means by which computers connected to the Internet get information about each other. The individual pieces of information are known as records; each record is of a certain type. Computers look up records for a domain by asking the name server for the domain about the records relevant to that domain.

IP addresses are the numbers which identify computers to each other.

Time Till Stable

This is an indication of the amount of time until you can be really sure that the change will be visible via all ISP DNS caches. The amount of time that Heart Internet’s DNS servers tell remote DNS caches to wait for is much shorter than this, however if you’ve just transferred the domain name in, several DNS caches will still be using the timeout that the old host’s DNS servers suggested, which is conventionally no longer than 24 hours.

If the TTS value is zero, that means that all remote caches should be up to date, so if you’re still seeing the old value that means there’s probably a problem (eg. you entered invalid DNS records). If the TTS value is above zero and you’re still seeing the old value, it’s more likely that your ISP’s DNS cache just hasn’t updated yet.

DNS Record Types

A records

These contain a mapping from a name to an IP address. An A record does not in itself mean that any particular service is available from the computer at that address; it just translates the name to the IP address.

CNAME records

These contain a mapping from one name (known as an alias) to another name (known as a CNAME, or canonical name). When a computer looks up records for the alias, it is given the records for the cname instead.

For example, if we set up a CNAME record for “managethisdomain.com” with alias “web” and cname “www”, then all queries for “web.managethisdomain.com” would be given the information for “www.managethisdomain.com”. A cname can be a name within the same domain, as in our example, or it can be a full name, like “www.google.com.”; the dot on the end shows that this is a full name.

MX records

These say where email for a domain is to be delivered. A domain can have several MX records; each one has a priority from 0 to 100. Email is delivered to the one with the lowest number first, and to any others only if the first one cannot accept it. For example, there is an MX record for “managethisdomain.com” pointing to “mail.managethisdomain.com”, with priority 10. This causes our email to be delivered to “mail.managethisdomain.com”.

TXT records

These give miscellaneous textual information about a domain; the most common use of them is for Sender Policy Framework (SPF), which enables you to specify which computers are allowed to send email which claims to be from your domain. For more information about SPF see the SPF Project.

SRV records

These records allow applications to locate services by giving the address and port information required. They also allow the load to be shared among several different servers using the priority and weight values.

If you found this page useful, consider linking to it.


  

Switch to our mobile site