I’m skeptical of this claim. Signal doesn’t seem like it’d be very compute-heavy, doesn’t seem like text and voice would be very network-heavy, and I don’t think video is used very much. If us-east-1 going down took out most their services, it doesn’t seem like they’re leveraging AWSs multi-region features very well anyways. It wouldn’t be too hard to just rent or co-locate hardware in multiple non-hyperscalar data-centers around the world, and run a multi-zone, highly available k8s cluster. Would probably be cheaper and more robust too. I don’t have experience with multi-zone k8s, but I was the sole person responsible for deploying and maintaining a highly-available single-datacenter k8s cluster on rented hardware, and it wasn’t even my primary job (was a full-stack engineer and team-lead), If I could do it, I don’t think they’d need to try to hire world’s top experts or anything. Coincidentally, the provider was UpCloud, which is a European company, and in 8 years of using them, I don’t recall seeing a single node we had become unresponsive for more than 5 minutes, and I’m not even sure those times were on UpCloud’s end.
It’s routing heavy. That’s latency sensitive and really needs distributed components when users are distributed. And it gets more complicated when you’re using many different local providers
I’m skeptical of this claim. Signal doesn’t seem like it’d be very compute-heavy, doesn’t seem like text and voice would be very network-heavy, and I don’t think video is used very much. If us-east-1 going down took out most their services, it doesn’t seem like they’re leveraging AWSs multi-region features very well anyways. It wouldn’t be too hard to just rent or co-locate hardware in multiple non-hyperscalar data-centers around the world, and run a multi-zone, highly available k8s cluster. Would probably be cheaper and more robust too. I don’t have experience with multi-zone k8s, but I was the sole person responsible for deploying and maintaining a highly-available single-datacenter k8s cluster on rented hardware, and it wasn’t even my primary job (was a full-stack engineer and team-lead), If I could do it, I don’t think they’d need to try to hire world’s top experts or anything. Coincidentally, the provider was UpCloud, which is a European company, and in 8 years of using them, I don’t recall seeing a single node we had become unresponsive for more than 5 minutes, and I’m not even sure those times were on UpCloud’s end.
It’s routing heavy. That’s latency sensitive and really needs distributed components when users are distributed. And it gets more complicated when you’re using many different local providers