Father, Hacker (Information Security Professional), Open Source Software Developer, Inventor, and 3D printing enthusiast

  • 2 Posts
  • 394 Comments
Joined 3 years ago
cake
Cake day: June 23rd, 2023

help-circle



  • but we can reasonably assume that Stable Diffusion can render the image on the right partly because it has stored visual elements from the image on the left.

    No, you cannot reasonably assume that. It absolutely did not store the visual elements. What it did, was store some floating point values related to some keywords that the source image had pre-classified. When training, it will increase or decrease those floating point values a small amount when it encounters further images that use those same keywords.

    What the examples demonstrate is a lack of diversity in the training set for those very specific keywords. There’s a reason why they chose Stable Diffusion 1.4 and not Stable Diffusion 2.0 (or later versions)… Because they drastically improved the model after that. These sorts of problems (with not-diverse-enough training data) are considered flaws by the very AI researchers creating the models. It’s exactly the type of thing they don’t want to happen!

    The article seems to be implying that this is a common problem that happens constantly and that the companies creating these AI models just don’t give a fuck. This is false. It’s flaws like this that leave your model open to attack (and letting competitors figure out your weights; not that it matters with Stable Diffusion since that version is open source), not just copyright lawsuits!

    Here’s the part I don’t get: Clearly nobody is distributing copyrighted images by asking AI to do its best to recreate them. When you do this, you end up with severely shitty hack images that nobody wants to look at. Basically, if no one is actually using these images except to say, “aha! My academic research uncovered this tiny flaw in your model that represents an obscure area of AI research!” why TF should anyone care?

    They shouldn’t! The only reason why articles like this get any attention at all is because it’s rage bait for AI haters. People who severely hate generative AI will grasp at anything to justify their position. Why? I don’t get it. If you don’t like it, just say you don’t like it! Why do you need to point to absolutely, ridiculously obscure shit like finding a flaw in Stable Diffusion 1.4 (from years ago, before 99% of the world had even heard of generative image AI)?

    Generative AI is just the latest way of giving instructions to computers. That’s it! That’s all it is.

    Nobody gave a shit about this kind of thing when Star Trek was pretending to do generative AI in the Holodeck. Now that we’ve got he pre-alpha version of that very thing, a lot of extremely vocal haters are freaking TF out.

    Do you want the cool shit from Star Trek’s imaginary future or not? This is literally what computer scientists have been dreaming of for decades. It’s here! Have some fun with it!

    Generative AI uses up less power/water than streaming YouTube or Netflix (yes, it’s true). So if you’re about to say it’s bad for the environment, I expect you’re just as vocal about streaming video, yeah?






  • The real problem here is that Xitter isn’t supposed to be a porn site (even though it’s hosted loads of porn since before Musk bought it). They basically deeply integrated a porn generator into their very publicly-accessible “short text posts” website. Anyone can ask it to generate porn inside of any post and it’ll happily do so.

    It’s like showing up at Walmart and seeing everyone naked (and many fucking), all over the store. That’s not why you’re there (though: Why TF are you still using that shithole of a site‽).

    The solution is simple: Everyone everywhere needs to classify Xitter as a porn site. It’ll get blocked by businesses and schools and the world will be a better place.





  • No, a .safetensors file is not a database. You can’t query a .safetensors file and there’s nothing like ACID compliance (it’s read-only).

    Imagine a JSON file that has only keys and values in it where both the keys and the values are floating point numbers. It’s basically gibberish until you go through an inference process and start feeding random numbers through it (over and over again, whittling it all down until you get a result that matches the prompt to a specified degree).

    How do the “turbo” models work to get a great result after one step? I have no idea. That’s like black magic to me haha.


  • Riskable@programming.devtopolitics @lemmy.worldWho Controls AI Exactly?
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    9 days ago

    Or, with AI image gen, it knows that when some one asks it for an image of a hand holding a pencil, it looks at all the artwork in it’s training database and says, “this collection of pixels is probably what they want”.

    This is incorrect. Generative image models don’t contain databases of artwork. If they did, they would be the most amazing fucking compression technology, ever.

    As an example model, FLUX.dev is 23.8GB:

    https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

    It’s a general-use model that can generate basically anything you want. It’s not perfect and it’s not the latest & greatest AI image generation model, but it’s a great example because anyone can download it and run it locally on their own PC (and get vastly superior results than ChatGPT’s DALL-E model).

    If you examine the data inside the model, you’ll see a bunch of metadata headers and then an enormous array of arrays of floating point values. Stuff like, [0.01645, 0.67235, ...]. That is what a generative image AI model uses to make images. There’s no database to speak of.

    When training an image model, you need to download millions upon millions of public images from the Internet and run them through their paces against an actual database like ImageNET. ImageNET contains lots of metadata about millions of images such as their URL, bounding boxes around parts of the image, and keywords associated with those bounding boxes.

    The training is mostly a linear process. So the images never really get loaded into an database, they just get read along with their metadata into a GPU where it performs some Machine Learning stuff to generate some arrays of floating point values. Those values ultimately will end up in the model file.

    It’s actually a lot more complicated than that (there’s pretraining steps and classifiers and verification/safety stuff and more) but that’s the gist of it.

    I see soooo many people who think image AI generation is literally pulling pixels out of existing images but that’s not how it works at all. It’s not even remotely how it works.

    When an image model is being trained, any given image might modify one of those floating point values by like ±0.01. That’s it. That’s all it does when it trains on a specific image.

    I often rant about where this process goes wrong and how it can result in images that look way too much like some specific images in training data but that’s a flaw, not a feature. It’s something that every image model has to deal with and will improve over time.

    At the heart of every AI image generation is a random number generator. Sometimes you’ll get something similar to an original work. Especially if you generate thousands and thousands of images. That doesn’t mean the model itself was engineered to do that. Also: A lot of that kind of problem happens in the inference step but that’s a really complicated topic…