For coding AI, it could make sense to specialize models on architecture, functional/array split from loopy solutions, or just asking 4 separate small models, and then using a judge model to pick the best parts of each.

  • hoshikarakitaridia@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 hours ago

    Yeah that tracks from what I’ve seen. There were some very interesting new approaches that could improve the base framework of all generative AIs but at this time MoE is the one important improvement that Deepseek pioneeredfor LLMs. I wonder if throwing knowledge at the problem might actually net us a bit more of an elegant solution but MoE is kind of the only thing that helps us scale LLMs.