• 0 Posts
  • 4 Comments
Joined 20 hours ago
cake
Cake day: August 6th, 2025

help-circle
  • VRAM vs RAM:

    VRAM (Video RAM): Dedicated memory on your graphics card/GPU - Used specifically for graphics processing and AI model computations - Much faster for GPU operations - Critical for running LLMs locally

    RAM (System Memory): Main system memory used by CPU and general operations - Slower access for GPU computations - Can be used as fallback but with performance penalty

    So - For basic 7B parameter LLMs locally, you typically need:

    Minimum: 8-12 GB VRAM - Can run basic inference/tasks - May require quantization (4-bit/8-bit)

    Recommended: 16+ GB VRAM - Smoother performance - Handle larger context windows - Run without heavy quantization

    Quantization means reducing the precision of the model’s weights and calculations to use less memory. For example, instead of storing numbers with full 32-bit precision, they’re compressed to 4-bit or 8-bit representations. This significantly reduces VRAM requirements but can slightly reduce model quality and accuracy.

    Options if you have less VRAM: CPU-only inference (very slow) - Model offloading to system RAM - Use smaller models (3B, 4B parameters)



  • Absolutely. China’s models are very advanced - especially Qwen. I don’t code professionally - just for personal projects. If I used these tools as an employee in a company, I’d defer to that company’s wishes. Regardless of whether it’s U. S. or China, these models are using and probably storing my data for further training. As they should! I don’t care - they’re providing me with powerful tools. I use free tiers and jump between them all. ChatGPT, Perplexity (general search), DeepSeek, Qwen (advanced programming), Google AI Studio and others. I’m grateful, not fearful.