SSE sucks I’m just going to cut to the chase here. SSE as a transport mechanism for LLM tokens is naff. It’s not that it can’t work, obviously it can, because people are using it and SDKs are built around it. But it’s not a great fit for the problem space.
The basic SSE flow goes something like this:
Client makes an HTTP POST request to the server with a prompt Server responds with a 200 OK and keeps the connection open Server streams tokens back to the client as they are generated, using the SSE format Client processes the tokens as they arrive on the long-lived HTTP connection Sure the approach has some benefits, like simplicity and compatibility with existing HTTP infrastructure. But it still sucks.