Protecting Your Art From AI

20 July 2023

Some Background

About a couple of months ago, Artstation users were protesting the use of their artwork in AI training data sets, used by services like MidJourney. These images were scraped from the site without the consent of the artist, and Artstation's response was underwhelming. In light of this, I decided to develop an Ai-Proof watermark. I failed, but found some other, more effective methods, which this article discusses.

Making My Own Solution

Section 1: Knowing the Adversary

My first step was to figure out how they worked. After some research, I discovered this page, which outlined the general method that was used:

  1. The AI is trained off of several images, constructing an average of the images. If the watermark has been placed consistently through all of the images, this step will result in the base watermark, which can be used as a mask on the image to be processed.
  2. The mask constructed in step 1 is then used to remove the watermarks in the input image.
  3. Once the watermark has been removed, a neural net then reconstructs the parts of the image behind the now removed watermark.

Now that I knew the means by which the watermark removal AI worked, I could go about creating a watermark that messed with as many of those proceses as possible.

Section 2: Crafting the Watermark

Taking inspiration from the work of @thatdogmagic on tumblr, I set about designing my own version of the watermark. I knew that the complex patterns and bright colors would interfere with both the reconstruction phase and the removal phase, and I figured that placing the watermartk randomly each time would interfere with the detection phase. Indeed, that last point was proven to hold true in an article by google.

The result of all this research? Well, not much, since my testing methods were flawed (though if you want to see them they can be found in my Artstation post on the topic). I needed to find a different solution.


Glaze

Glaze is a tool that applies deformations to an image to mask the style. The cloaking works even if the visible patterns are removed with a denoising algorithm, albeit with reduced effectiveness. Of course, this doesn't stop people from trying to train models off of your work, but it does reduce the effectiveness of the training.


HTML MetaTags

There are HTML tags that you can inncorporate into the header of your website to disallow webcrawlers and scrapers from using your content. Just copy and paste the following into the header of each page:

      <!-- The Common Crawl dataset. Used by GPT-3 (and GPT-3.5) and available for others. -->
      <meta name="CCBot" content="nofollow">
      <!-- Used by DeviantArt, ArtStation, etc. based on opt-in or opt-out -->
      <meta name="robots" content="noai, noimageai">