Do you use models developed in China?

RoadTrain@lemdro.id · 4 months ago

Do you use models developed in China?

mapumbaa@lemmy.zip · 4 months ago

I believe the full size DeepSeek-R1 require about 1200 GB of VRAM. But there are many configurations that require much less. Quantization, MoE and other hacks. I don’t have much experience with MoE, however I find that quantization tend to decrease performance significantly. At least with models from Mistral.

Valmond@lemmy.world · 4 months ago

That’s lots of vram