Data poisoning: how artists are sabotaging AI to take revenge on image generators::As AI developers indiscriminately suck up online content to train their models, artists are seeking ways to fight back.

  • gaiussabinus@lemmy.world
    link
    fedilink
    English
    arrow-up
    44
    arrow-down
    3
    ·
    2 years ago

    This system runs on the assumption that A) massive generalized scraping is still required B) You maintain the metadata of the original image C) No transformation has occurred to the poisoned picture prior to training(Stable diffusion is 512x512). Nowhere in the linked paper did they say they had conditioned the poisoned data to conform to the data set. This appears to be a case of fighting the last war.

  • Blaster M@lemmy.world
    link
    fedilink
    English
    arrow-up
    35
    arrow-down
    3
    ·
    2 years ago

    Takes image, applies antialiasing and resize

    Oh, look at that, defeated by the completely normal process of preparing the image for training

  • qooqie@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    3
    ·
    2 years ago

    Unfortunately for them there’s a lot of jobs dedicated to cleaning data so I’m not sure if this would even be effective. Plus there’s an overwhelming amount of data that isn’t “poisoned” so it would just get drowned out if never caught

  • Potatos_are_not_friends@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    1
    ·
    2 years ago

    Imagine if writers did the same things by writing gibberish.

    At some point, it becomes pretty easy to devalue that content and create other systems to filter it.

    • books@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I mean isn’t that eventually going to happen? Isn’t ai going to eventually learn and get trained from ai datasets and small issues will start to propagate exponentially?

      I just assume we have a clean dataset preai and messy gross dataset post ai… If it keeps learning from the latter dataset it will just get worse and worse, no?

      • General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 years ago

        Not really. It’s like with humans. Without the occasional reality checks it gets weird, but what people chose to upload is a reality check.

        The pre-AI web was far from pristine, no matter how you define that. AI may improve matters by increasing the average quality.

  • KᑌᔕᕼIᗩ@lemmy.ml
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    2
    ·
    2 years ago

    Artists and writers should be entitled to compensation for using their works to train these models, just like any other commercial use would. But, you know, strict, brutal free-market capitalism for us, not the mega corps who are using it because “AI”.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      3
      ·
      2 years ago

      Shhhhh.

      Let them keep doing the modern equivalent of “I do not consent for my MySpace profile to be used for anything” disclaimers.

      It keeps them busy on meaningless crap that isn’t actually doing anything but makes them feel better.

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    edit-2
    2 years ago

    This doesn’t actually work. It doesn’t even need ingestion to do anything special to avoid.

    Let’s say you draw cartoon pictures of cats.

    And your friend draws pointillist images of cats.

    If you and your friend don’t coordinate, it’s possible you’ll bias your cat images to look like dogs in the data but your friend will bias their images to look like horses.

    Now each of your biasing efforts become noise and not signal.

    Then you need to consider if you are also biasing ‘cartoon’ and ‘pointillism’ attributes as well, and need to coordinate with the majority of other people making cartoon or pointillist images.

    When you consider the number of different attributes that need to be biased for a given image and the compounding number of coordinations that would need to be made at scale to be effective, this is just a nonsense initiative that was an interesting research paper in lab conditions but is the equivalent of a mouse model or in vitro cancer cure being taken up by naturopaths as if it’s going to work in humans.

  • RagingRobot@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    2 years ago

    So it sounds like they are taking the image data and altering it to get this to work and the image still looks the same just the data is different. So, couldn’t the ai companies take screenshots of the image to get around this?

  • Sabin10@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    15
    ·
    2 years ago

    Data poisoning isn’t limited to just AI stuff and you should be doing it at every opportunity.

  • Dr. Moose@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    24
    ·
    2 years ago

    Just don’t out your art to public if you don’t want someone/thing learn from it. The clinging to relevance and this pompous self importance is so cringe. So replacing blue collar work is ok but some shitty drawings somehow have higher ethical value?

    • Red_October@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      3
      ·
      2 years ago

      The idea that you would actually object to replacing labor with automation, but think replacing art with automation is fine, is genuinely baffling.

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        14
        ·
        2 years ago

        Except the “art” ai is replacing is labor. This snobby ridiculous bullshit that some corporate drawings are somehow more important than other things is super cringe.

      • Flying Squid@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        3
        ·
        2 years ago

        Are you actually suggesting that if I post a drawing of a dog, Disney should be allowed to use it in a movie and not compensate me?

        • cm0002@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          5
          ·
          2 years ago

          Ofc not, that’s way different, that’s beyond the use of public use.

          If I browse to your Instagram, look at some of your art, record some numbers about it, observe your style and then leave that’s perfectly fine right? If I then took my numbers and observations from your art and everybody else’s that I looked and merged them together to make my own style that would also be fine right? Well that’s AI, that’s all it does on a simple level

          • Flying Squid@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            2
            ·
            2 years ago

            But they are still profiting off of it. Dall-E doesn’t make images out of the kindness of OpenAI’s heart. They’re a for-profit company. That really doesn’t make it different from Disney, does it?

            • cm0002@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              4
              ·
              2 years ago

              Sure, Dall-E has a profit motive, but then what about all the open source models that are trained on the same or similar data and artworks?

              • Flying Squid@lemmy.world
                link
                fedilink
                English
                arrow-up
                5
                arrow-down
                2
                ·
                2 years ago

                You’ve strayed very far from:

                if you post publicly, expect it to be used publicly

                What is the difference between Dall-E scraping the art and an open source model doing it other than Dall-E making money at it? It’s still using it publicly.

                • cm0002@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  3
                  ·
                  2 years ago

                  I didn’t really stray far, you brought up that Dall-E has a profit motive and I acknowledged that yea that was true, but there also open source models that don’t