Nearly all servers run on linux, nearly all supercomputers are some kind of locally networked cluster… that run linux.
Theres… theres no conflict here.
In fact, this kind of multi computer paradigm for Linux is the core of why X11 is weird and fucky, in the context of a modern, self contained PC, and why Wayland is a thing nowadays.
X11 is built around a paradigm where you have a whole bunch of hardware units doing actual calcs of some kind, and then, some teeny tiny hardware that is basically just a display and input device… well thats the only thing that even needs to load a display or input related code/software/library.
You also don’t really need to worry so much about security in the display/input framework itself, because your only potential threat is basically a rogue employee at your lab, and everyone working there is some kind of trained expert.
This makes sense if your scenario is a self contained computer research facility that is only networked to what is in its building…
… it makes less sense and has massive security problems if you have a single machine that can do all of that, and that single machine is also networked to millions of remote devices (via the modern internet), and in a world where computer viruses, malware, are a multi billion dollar industry… and the average computer user is roughly as intelligent and knowledgeable as a 6th grader.
They still probably need a ton of customization and tuning at the driver level and beyond, which open source allows for.
I am sure there is plenty of existing “super computer”-grade software in the wild already, but a majority of it probably needs quite a bit of hacking to get running smoothly on newer hardware configurations.
As a matter of speculation, the engineers and scientists that build these things are probably hyper-picky about how some processes execute and need extreme flexibility.
So, I would say it’s a combination of factors that make Linux a good choice.
Surprisingly not a lot of ‘exciting tuning’, a lot of these are exceedingly conservative when it comes to tuning. From a software perspective, the most common “weird” thing in these systems is the affinity for diskless boot, and that’s mostly coming from a history of when hard drives used to be a more frequent failure causing downtime (yes, the stateless nature of diskless boot continues to be desired, but the community would have likely never bothered if not for OS HDD failures). They also sometimes like managing the OS kind of like a common chroot to oversimplify, but that’s mostly about running hundreds of thousands of what should be the exact same thing over and over again, rather than any exotic nature of their workload.
Linux is largely the choice by virtue of this market evolving from largely Unix based but most applications they used were open source, out of necessity to let them bid, say, Sun versus IBM versus SGI and still keep working regardless of who was awarded the business. In that time frame, Windows NT wasn’t even an idea, and most of these institutions wouldn’t touch ‘freeware’ for such important tasks.
In the 90s Linux happened and critically for this market, Red Hat and SUSE happened. Now they could have a much more vibrant and fungible set of hardware vendors with some credible commercial software vendor that could support all of them. Bonus that you could run the distributions or clones for free to help a lot of the smaller academic institutions get a reasonable shot without diverting money from hardware to software. Sure, some aggressively exotic things might have been possible versus the prior norm of proprietary, but mostly it was about the improved vendor-to-vendor consistency.
Microsoft tried to get into this market in the late 2000s, but no one asked for them. They had poor compatibility with any existing code, were more expensive, and much worse at managing at scale in the context of headless, multi-user compute nodes.
This is called a “Cluster” and it precedes Linux by a decade or two, but yes.
And what else would the supercomputers run on? Windows? You won’t get into the tops if half your computers are bluescreening while the other half is busy updating…
The times when supercomputers were batch-oriented machines where your calculation was the only thing that was running on the hardware, with your software basically including the OS (or at least the parts that you needed) are long over.
Some have thousands but yes.on most of these systems :
Process launch and scheduling is done by a resource manager (SLURM is common)
Inter process communication uses an MPI implementation (like OpenMPI)
These inter node communications uses a low latency (and high bandwidth) network. This is dominated by Infiniband from Nvidia (formerly Mellanox)
What’s really peculiar in modern IT, is that it often use old school Unix multi user management. Users connect to the system through SSH with their own username, use a POSIX filesystem and their processes are executed with their own usernames.
There is kernel knobs to pay attention to, but generally standard RHEL kernels are used.
I think that the software is specialized, but the hardware is not. They use some smart algorithms to distribute computation over huge number of workers.
Not really, we are not in the eighties anymore, modern supercomputers are mainly a bunch of off the shelf servers connected together
I mean, what the first person said is true…
… and what you have just said is true.
There is no tension between these concepts.
Nearly all servers run on linux, nearly all supercomputers are some kind of locally networked cluster… that run linux.
Theres… theres no conflict here.
In fact, this kind of multi computer paradigm for Linux is the core of why X11 is weird and fucky, in the context of a modern, self contained PC, and why Wayland is a thing nowadays.
X11 is built around a paradigm where you have a whole bunch of hardware units doing actual calcs of some kind, and then, some teeny tiny hardware that is basically just a display and input device… well thats the only thing that even needs to load a display or input related code/software/library.
You also don’t really need to worry so much about security in the display/input framework itself, because your only potential threat is basically a rogue employee at your lab, and everyone working there is some kind of trained expert.
This makes sense if your scenario is a self contained computer research facility that is only networked to what is in its building…
… it makes less sense and has massive security problems if you have a single machine that can do all of that, and that single machine is also networked to millions of remote devices (via the modern internet), and in a world where computer viruses, malware, are a multi billion dollar industry… and the average computer user is roughly as intelligent and knowledgeable as a 6th grader.
Soooo many raspberry pis…
They still probably need a ton of customization and tuning at the driver level and beyond, which open source allows for.
I am sure there is plenty of existing “super computer”-grade software in the wild already, but a majority of it probably needs quite a bit of hacking to get running smoothly on newer hardware configurations.
As a matter of speculation, the engineers and scientists that build these things are probably hyper-picky about how some processes execute and need extreme flexibility.
So, I would say it’s a combination of factors that make Linux a good choice.
Surprisingly not a lot of ‘exciting tuning’, a lot of these are exceedingly conservative when it comes to tuning. From a software perspective, the most common “weird” thing in these systems is the affinity for diskless boot, and that’s mostly coming from a history of when hard drives used to be a more frequent failure causing downtime (yes, the stateless nature of diskless boot continues to be desired, but the community would have likely never bothered if not for OS HDD failures). They also sometimes like managing the OS kind of like a common chroot to oversimplify, but that’s mostly about running hundreds of thousands of what should be the exact same thing over and over again, rather than any exotic nature of their workload.
Linux is largely the choice by virtue of this market evolving from largely Unix based but most applications they used were open source, out of necessity to let them bid, say, Sun versus IBM versus SGI and still keep working regardless of who was awarded the business. In that time frame, Windows NT wasn’t even an idea, and most of these institutions wouldn’t touch ‘freeware’ for such important tasks.
In the 90s Linux happened and critically for this market, Red Hat and SUSE happened. Now they could have a much more vibrant and fungible set of hardware vendors with some credible commercial software vendor that could support all of them. Bonus that you could run the distributions or clones for free to help a lot of the smaller academic institutions get a reasonable shot without diverting money from hardware to software. Sure, some aggressively exotic things might have been possible versus the prior norm of proprietary, but mostly it was about the improved vendor-to-vendor consistency.
Microsoft tried to get into this market in the late 2000s, but no one asked for them. They had poor compatibility with any existing code, were more expensive, and much worse at managing at scale in the context of headless, multi-user compute nodes.
So is it just hundreds of servers, each running their own OS and coordinating on tasks?
This is called a “Cluster” and it precedes Linux by a decade or two, but yes.
And what else would the supercomputers run on? Windows? You won’t get into the tops if half your computers are bluescreening while the other half is busy updating…
The times when supercomputers were batch-oriented machines where your calculation was the only thing that was running on the hardware, with your software basically including the OS (or at least the parts that you needed) are long over.
Some have thousands but yes.on most of these systems :
What’s really peculiar in modern IT, is that it often use old school Unix multi user management. Users connect to the system through SSH with their own username, use a POSIX filesystem and their processes are executed with their own usernames.
There is kernel knobs to pay attention to, but generally standard RHEL kernels are used.
That’s it!
I think that the software is specialized, but the hardware is not. They use some smart algorithms to distribute computation over huge number of workers.
deleted by creator