Block AI Scraping with Robots.txt

AI is cool but AI scraping and stealing your hard work is not. As an artist or writer or any other content creator there’s nothing worse then someone stealing your work and passing it off as their own. That’s what AI training has become.

AI Scraping

Using copyrighted work to train AI

I seen a lot of headlines lately of were stolen works are being used to train AI. Like Meta using 80TB of pirated content to train its AI. And more recently Google calling for weakened copyright and export rules so they can use stolen works to train AI.

Update: A few days after I made this post I seen this pop up that talks about the trouble these AI crawlers are causing. Also Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives.

Its not fair to the people who pour out countless hours into their works. But what can we do about it? Well its not a perfect solution but we can use robots.txt on our websites to mitigate some of the scraping.

How to mitigate AI scraping with robots.txt

The easy thing would be to put everything behind an login. But then you lose out on SEO so people can’t find you as easy and most people don’t want to make an account for every site they visit. So the next best thing would be to try to stop the bots that honor a robots.txt file.

Here is a github page I found were people are identifying AI crawling agents and making a robots.txt to ask them to stop. You may or may not know but robots.txt is an honor system. The crawler can ignore it. But a lot of legitimate crawlers do honor them.

So while this is not a perfect solution it can at least keep your hard work from being sucked up by some of the bigger players out there.

My testing

Before I posted anything about this I implemented it on my site with a few changes. I’ve watched it for a few months to see if it would hurt SEO or search traffic. It looks fine so far so I think it should be safe for anyone. But adjust were you need to.

Anubis to stop scrapers

Update: A friend told me about Anubis recently. I have not used it myself but for for the sake of giving you the best info I can I wanted to go ahead and add it here. Anubis adds a proof of work challenge to websites before users can access them. I looked into it briefly and it seems like a cool idea. But its required to live outside the web server. So its not something I can drop in and test real quick. They even say “Anubis is a bit of a nuclear response“. So I’ll probably skip this for now,

I hope this helps

I hope this is of some help to you. Keep creating and being awesome!

Affiliate links

Thanks for reading the post! Before you go I'm testing something new. You may see AI generated affiliate links below this text based on the post contents. If you see something intresting then thats awesome. If its way off topic then just ignore them haha. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *