Está en la página 1de 4

A few days ago my partner John, noticed that our web sites content was being scraped.

It wasnt all that concerning at the time, but last night he did some keyword searches on Google to check our rankings and noticed that the site with our stolen content actually ranked higher than ours!
Object 1

This was obviously a problem. So, we immediately needed to figure out what steps to take. John sent a DMCA to Google as I put together a cease and desist letter to send off to the site owner, domain registrar, and their host. During all of this, we were determining the IP address of the site. We did a whois on their domain name which resulted in 3 different IPs and when we pinged their domain we found a forth. In order to block their domain ranges we added the following to the .htaccess file: Order Deny,Allow Deny from 127.0.0.0 This will block access for any user with an address in the 123.123.123.0 to 123.123.123.255 range. John then thought of a way to use this to our advantage. What if we detected any traffic from their domain and instead of blocking it, we redirect it to our homepage so they become OUR visitors. We created a rewrite condition like: RewriteCond ${HTTP_REFERER} ^123\.123\.123\. RewriteRule .? index.php [R=301,L] This should redirect anyone from their domain to our homepage. Now to just see if it works! You may also want to stop people from linking to your images, javascript, swf, and css files. This is known as HotLinking, and it cost you bandwidth when they do it. If you would like to prevent HotLinking then add the following to your .htaccess file. # START Prevent HotLinking RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www\.)?search-this.com/.*$ [NC] RewriteRule \.(gif|jpg|js|css|swf)$ - [F] # END Prevent HotLinking This will prevent HotLinking to your gif, jpg, js, css and swf files. Just remember that mod_rewrite should be enabled for this to work. You may also decide you want to replace a HotLinked image with your own image. To do this add the following to your .htaccess file: RewriteEngine On RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www\.)?search-this.com/.*$ [NC] RewriteRule \.(gif|jpg)$ http://www.searchthis.com/images/hotlinked.jpg [R,L] Now when they link to one of your images it will display the alternate image that you provided. Hope this helps someone out there And finally, if you need to find any merchant account related information dont visit the imposters, visit the Ultimate Merchant Account Resource.
Subscribe to this feed! Subscribe to Search-This by Email

Related Articles:
SEOs Cant We All Just Get Along? Googles PageRank Explained When to Fire a Client Links for the Weekend, 6-16-2007 Links for the Weekend, 1-5-2008 Flash Rewind: Great Flash Articles

13 Responses to Stop Site Scraping


1 Ben Partch
March 6th, 2007 at 11:01 am

Hello Thanks for this. The site in question had scrapped my site also. This was very helpful. Also to anyone who reads this, I notice that the site in question has scrapped many, many sites just like mine and Saras/Johns. Including but not limited to, this site Mark. Now I am off to do all my other sites. 2 Golgotha
March 6th, 2007 at 3:13 pm

Hey Ben, ST gets scraped by at least half a dozen sites. It typically doesnt effect us because our pagerank is higher than the scrapers. It has become an epidemic though. 3 TOMAS
March 6th, 2007 at 3:21 pm

Thanks for the informative post, we need more posts like this that help us thwart the bad guys and keep our content safe and secure! *hint* *hint* Also, how did you figure out that the site scraping was going on in the first place? Was it through your visitor tracking software or Technorati? 4 John Conde
March 6th, 2007 at 3:26 pm

I actually found it while doing a routine ranking check for my site. My site is sandboxed (or however you wish to describe it) and appears for no quality keywords. But I check regularly anyway so I can tell when the dark cloud has lifted.

Anyway, one day I noticed I ranked moderately well for some decent search terms. I checked it out and it wasnt my site but theirs. Its bad enough I cant get my site ranked but I dont need them to rank well from my hard work. I have enough of an uphill battle and I dont need my site to be seen as duplicate content to a site scraper. 5 Golgotha
March 6th, 2007 at 3:37 pm

Yes Tomas, I found out through tracking software (AWStats) and Technorati. In addition, I have received e-mails from people telling me so and so is scraping your site. 6 John Loch
March 7th, 2007 at 3:10 am

Is the site in question replicating your merchant site content, or this ones (as in search-this.com) ? 7 ses5909
March 7th, 2007 at 4:27 am

Both. I wrote the post about my Merchant account site, but Mark is explaining how he has had the same thing happen to him. 8 Dan Schulz
March 7th, 2007 at 10:47 pm

This is just crazy. I know it goes on, but people just need to pull their heads out of wherever theyre shoving them and realize that they cant get away with this kind of garbage forever. Thanks for sharing the tip. Now to see if it works. 9 Karl Groves
March 19th, 2007 at 4:54 am

I hate to be the bearer of bad news but these methods are not reliable. I decided to test to see if I could scrape this site. Typical methods with PHP and file() or file_get_contents() were blocked and simply returned a string that says Stop Site Scraping. However, your methods do nothing against a scraper using Curl. With 8 lines of Curl, I was able to retrieve this site completely. All the scraper would need to do then would their regular processing in order to put your data on their site. Naturally, I wont reproduce the code here, but if you know any Curl, its just a simple GET request. At any rate, good luck. People stealing content suck. 10 Dont get stressed about blog scrapers stealing your content
May 15th, 2008 at 12:15 pm

[...] Blog Scraping have you been a victim? How to protect yourself against Blog Post Theft and Splogs! Top 8 Excuses for Stealing Other Peoples Content Six Steps to Prevent Content Theft and Combat Copyright Infringement on Your Business Blog The 6 Steps to Stop Content Theft How to deter thieves from stealing your images and server bandwidth Blog Plagiarism Q&A Stop Site Scraping [...] 11 Attention: WordPress Bloggers, Is Someone Scraping Your Blogs RSS Feeds?
July 26th, 2008 at 9:52 pm

[...] If you would like to stop people from HotLinking linking to your images, javascript, swf, and CSS files. Just modify your .htaccess file with the following code. Code provided by: Search-This [...] 12 Attention: WordPress Bloggers, Is Someone Scraping Your Blog's RSS Feeds?
June 11th, 2009 at 11:08 pm

[...] If you have had enough of these people, you can stop them from HotLinking linking to your

images, javascript, swf, and CSS files. Just modify your .htaccess file with the following code. Code provided by: Search-This [...] 13 Brett Wraight
March 3rd, 2011 at 7:45 am

Check out these guys http://www.scrapestopper.com. They are able to stop Sites for being scraped and protect a site from all forms of scraping. They offer a trial period Awesome and there system is so easy use. In the reports they track down the scrapers for you very cool indeed worth a look at..

También podría gustarte