• Lambda@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 hours ago

    I’m not sure if this is the best place to put this, but I wanted to share my thoughts.

    For background, I got an undergraduate degree in software engineering. I took a few electives on AI, but it was more primitive back then (state of the art were GPT-2 and early stable diffusion models, it was pre “Attention is all you need”). Since then I’ve tried out the new stuff, made some generic profile pictures with “AI”, messed around with a few prompts on GPT-4, but not really used any “AI” for anything productive. I haven’t found a single valid use-case at my work for anything past MLPs (which admittedly work great).

    My first reaction was: “wow, that’s actually pretty impressive – at least relative to what AI/ML could do a decade ago”. I then noticed the size of the models, leading to my first question: how bad is the “efficiency” of increasing model size?! I know that MLPs reach serious diminishing returns before getting into the billions of parameters; and I understand that the transformers eat up a bunch of additional parameters. Nonetheless, I almost expected more from statements like “480 billion parameters”. IIRC that’s significantly more parameters than neurons in the human brain (though probably fewer nodes due to the more connected nature of artificial neural networks).

    The next thing I realized was that there are a lot more models out there than I thought. I had of course heard of the GPTs, and played around with LLaMA, plus news has made me aware of Gemini, Claude, and Grok. I didn’t realize just how many models there were. Given their size, that leads me to my next question: given the cost (in both $ and environmental impact) of training an AI, why are so many companies training their own AIs instead of in some way sharing training time on the same model? Surely the training sets can’t be that different given their size. A handful of different models makes sense, but this seems like a massive waste of resources to me.

    Finally, I realized that I really am a bit out-of-date on AI usage (though not overly out-of-date on AI research). I haven’t tried interrogating weights of any modern model, I didn’t even know you could reasonably ask modern models to dump their output probabilities like that, though of course it makes sense. So my final question: are there any FOSS models with state of the art techniques being used? (i.e. expert dispatch and LoRAs)