About a couple of months ago, Artstation users were protesting the use of their artwork in AI training data sets, used by services like MidJourney. These images were scraped from the site without the consent of the artist, and Artstation's response was underwhelming. In light of this, I decided to develop an Ai-Proof watermark. I failed, but found some other, more effective methods, which this article discusses.
My first step was to figure out how they worked. After some research, I discovered this page, which outlined the general method that was used:
Now that I knew the means by which the watermark removal AI worked, I could go about creating a watermark that messed with as many of those proceses as possible.
Taking inspiration from the work of @thatdogmagic on tumblr, I set about designing my own version of the watermark. I knew that the complex patterns and bright colors would interfere with both the reconstruction phase and the removal phase, and I figured that placing the watermartk randomly each time would interfere with the detection phase. Indeed, that last point was proven to hold true in an article by google.
The result of all this research? Well, not much, since my testing methods were flawed (though if you want to see them they can be found in my Artstation post on the topic). I needed to find a different solution.
Glaze is a tool that applies deformations to an image to mask the style. The cloaking works even if the visible patterns are removed with a denoising algorithm, albeit with reduced effectiveness. Of course, this doesn't stop people from trying to train models off of your work, but it does reduce the effectiveness of the training.
There are HTML tags that you can inncorporate into the header of your website to disallow webcrawlers and scrapers from using your content. Just copy and paste the following into the header of each page:
<!-- The Common Crawl dataset. Used by GPT-3 (and GPT-3.5) and available for others. --> <meta name="CCBot" content="nofollow"> <!-- Used by DeviantArt, ArtStation, etc. based on opt-in or opt-out --> <meta name="robots" content="noai, noimageai">