• tyler@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    3
    ·
    10 months ago

    I read the whole thing. I understand it’s for detecting use of nightshade, not bypassing it. What other even slightly ethical use for this is there besides trying to make sure you don’t train on a poisoned image? These models are clearly not asking for permission first, else you’d never need to do this, so they’re just taking an image, assuming they’re allowed to use it, and then using this tool to detect if it’s going to poison their model.

    • elliot_crane@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 months ago

      I don’t think most people are collecting images by hand and saying “ah yes I’m just gonna yoink this and use it in my model”. There are a plethora of sites for sharing repositories of training data, and therefore it’s pretty easy for someone training a model to unknowingly pull down some data they don’t actually have permission to use. It’s completely infeasible to check licensing by hand on what could be millions of images, so this tool makes it easy to simply not train on images that have gone through Nightshade. I fail to see how that’s unethical, as not training on the image is the whole reason the original image was put through Nightshade in the first place.

      • tyler@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        10 months ago

        it’s completely infeasible

        Then it shouldn’t be done. That’s the unethical part. Trying to just avoid the problem by continuing to scrape large data sets for images that you shouldn’t be using is the entire problem. Either get permission for each image or don’t build your image model. Doing otherwise is unethical.

        • elliot_crane@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          10 months ago

          Again, in many instances, folks training models are using repositories of images that have been publicly shared. In many cases the person/people who assembled the image repositories are not the same person using them. I agree that reckless scraping is not responsible, but if you’re using a repository of images that’s presented as ok to use for AI training, I’d argue it’s even more ethical to strip out the Nightshaded images, because clearly the presence of Nigthshade means you shouldn’t use that one. I guess we’re just going to have to agree to disagree here, because I see this as a helpful tool to specifically avoid training on images you shouldn’t be.