Recent DeepSeek, Qwen, GLM models have impressive results in benchmarks. Do you use them through their own chatbots? Do you have any concerns about what happens to the data you put in there? If so, what do you do about it?

I am not trying to start a flame war around the China subject. It just so happens that these models are developed in China. My concerns with using the frontends also developed in China stem from:

  • A pattern that many Chinese apps in the past have been found to have minimal security
  • I don’t think any of the 3 listed above let you opt out of using your prompts for model training

I am also not claiming that non-China-based chatbots don’t have privacy concerns, or that simply opting out of training gets you much on the privacy front.

  • greplinux@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    10 hours ago

    VRAM vs RAM:

    VRAM (Video RAM): Dedicated memory on your graphics card/GPU - Used specifically for graphics processing and AI model computations - Much faster for GPU operations - Critical for running LLMs locally

    RAM (System Memory): Main system memory used by CPU and general operations - Slower access for GPU computations - Can be used as fallback but with performance penalty

    So - For basic 7B parameter LLMs locally, you typically need:

    Minimum: 8-12 GB VRAM - Can run basic inference/tasks - May require quantization (4-bit/8-bit)

    Recommended: 16+ GB VRAM - Smoother performance - Handle larger context windows - Run without heavy quantization

    Quantization means reducing the precision of the model’s weights and calculations to use less memory. For example, instead of storing numbers with full 32-bit precision, they’re compressed to 4-bit or 8-bit representations. This significantly reduces VRAM requirements but can slightly reduce model quality and accuracy.

    Options if you have less VRAM: CPU-only inference (very slow) - Model offloading to system RAM - Use smaller models (3B, 4B parameters)