• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    edit-2
    1 day ago

    OK, so I just checked the page:

    https://lumo.proton.me/guest

    Looks like a generic Open Web UI instance, much like Qwen’s: https://openwebui.com/

    Based on this support page, they are using open models and possibly finetuning them:

    https://proton.me/support/lumo-privacy

    The models we’re using currently are Nemo, OpenHands 32B, OLMO 2 32B, and Mistral Small 3

    But this information is hard to find, and they aren’t particularly smart models, even for 32B-class ones.

    Still… the author is incorrect, they specify how long requests are kept:

    When you chat with Lumo, your questions are sent to our servers using TLS encryption. After Lumo processes your query and generates a response, the data is erased. The only record of the conversation is on your device if you’re using a Free or Plus plan. If you’re using Lumo as a Guest, your conversation is erased at the end of each session. Our no-logs policy ensures wekeep no logs of what you ask, or what Lumo replies. Your chats can’t be seen, shared, or used to profile you.

    But it also mentions that, as is a necessity now, they are decrypted on the GPU servers for processing. Theoretically they could hack the input/output layers and the tokenizer into a pseudo E2E encryption scheme, but I haven’t heard of anyone doing this yet… And it would probably be incompatible with their serving framework (likely vllm) without some crack CUDA and Rust engineers (as you’d need to scramble the text and tokenize/detokenize it uniquely for scrambled LLM outer layers for each request).

    They are right about one thing: Proton all but advertise Luma as E2E when that is a lie. Per its usual protocol, Open Web UI will send the chat history for that particular chat to the server for each requests, where it is decoded and tokenized. If the GPU server were to be hacked, it could absolutely be logged and intercepted.