"These price increases have multiple intertwining causes, some direct and some less so: inflation, pandemic-era supply crunches, the unpredictable trade policies of the Trump administration, and a gradual shift among console makers away from selling hardware at a loss or breaking even in the hopes that game sales will subsidize the hardware. And you never want to rule out good old shareholder-prioritizing corporate greed.

But one major factor, both in the price increases and in the reduction in drastic “slim”-style redesigns, is technical: the death of Moore’s Law and a noticeable slowdown in the rate at which processors and graphics chips can improve."

  • Aceticon@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    5 hours ago

    When two processing devices try and access the same memory there are contention problems as the memory cannot be accessed by two devices at the same time (well, sorta: parallel reads are fine, it’s when one side is writing that there can be problems), so one of the devices has to wait, so it’s slower than dedicated memory but the slowness is not constant since it depends on the memory access patterns of both devices.

    There are ways to improve this: for example, if you have multiple channels on the same memory module then contention issues are reduced to the same memory block, which depends on the block-size, though this also means that parallel processing on the same device - i.e. multiple cores - cannot use the channels being used by a different device so it’s slower.

    There are also additional problems with things like memory caches in the CPU and GPU - if an area of memory cached in one device is altered by a different device that has to be detected and the cache entry removed or marked as dirty. Again, this reduces performance versus situations where there aren’t multiple processing devices sharing memory.

    In practice the performance impact is highly dependent on if an how the memory is partitioned between the devices, as well as by the amount of parallelism in both processing devices (this latter because of my point from above that memory modules have a limited number of memory channels so multiple parallel accesses to the same memory module from both devices can lead to stalls in cores of one or both devices since not enough channels are available for both).

    As for the examples you gave, they’re not exactly great:

    • First, when loading models into the GPU memory, even with SSDs the disk read is by far the slowest part and hence the bottleneck, so as long as things are being done in parallel (i.e. whilst the data is loaded from disk to CPU memory, already loaded data is also being copied from CPU memory to GPU memory) you won’t see that much difference between loading to CPU memory and then from there to GPU memory and direct loading to GPU memory. Further, the manipulation of models in shared memory by the CPU introduces the very performance problems I was explaining above, namely contention problems from both devices accessing the same memory blocks and GPU cache entries getting invalidated because the CPU altered that data in the main memory.
    • Second, if I’m not mistaken tone mapping is highly parallelizable (as pixels are independent - I think, but not sure since I haven’t actually implemented this kind of post processing), which means that the best by far device at parallel processing - the GPU - should be handling it in a shader, not the CPU. (Mind you, I might be wrong in this specific case if the algorithm is not highly parallelizable. My own experience with doing things via CPU or via shaders running in the GPU - be it image shaders or compute shaders - is that in highly parallelizable stuff, a shader in the GPU is way, way faster than an algorithm running in the CPU).

    I don’t think that direct access by the CPU to manipulate GPU data is at all a good thing (by the reasons given on top) and to get proper performance out of a shared memory setup at the very least the programming must done in a special way that tries to reduce collisions in memory access, or the whole thing must be setup by the OS like it’s done on PCs with integrated graphics, were a part of the main memory is reserved for the GPU by the OS itself when it starts and the CPU won’t touch that memory after that.