I don’t understand why today’s LLMs are so large. Some of the smallest models getting coverage sit at 2.7B parameters, but even this seems pretty big to me.
If you need generalizability, I totally get it. Things like chat applications require a high level of semantic awareness, and the model has to respond in a manner that’s convincing enough to its users. In cases where you want the LLM to produce something human-like, it makes sense that the brains would need to be a little juiced up.