Chips aren’t improving like they used to, and it’s killing game console price cuts

j4p@lemm.ee · 1 year ago

Chips aren’t improving like they used to, and it’s killing game console price cuts

Aceticon@lemmy.dbzer0.com · edit-2 1 year ago

When two processing devices try and access the same memory there are contention problems as the memory cannot be accessed by two devices at the same time (well, sorta: parallel reads are fine, it’s when one side is writing that there can be problems), so one of the devices has to wait, so it’s slower than dedicated memory but the slowness is not constant since it depends on the memory access patterns of both devices.

There are ways to improve this: for example, if you have multiple channels on the same memory module then contention issues are reduced to the same memory block, which depends on the block-size, though this also means that parallel processing on the same device - i.e. multiple cores - cannot use the channels being used by a different device so it’s slower.

There are also additional problems with things like memory caches in the CPU and GPU - if an area of memory cached in one device is altered by a different device that has to be detected and the cache entry removed or marked as dirty. Again, this reduces performance versus situations where there aren’t multiple processing devices sharing memory.

In practice the performance impact is highly dependent on if an how the memory is partitioned between the devices, as well as by the amount of parallelism in both processing devices (this latter because of my point from above that memory modules have a limited number of memory channels so multiple parallel accesses to the same memory module from both devices can lead to stalls in cores of one or both devices since not enough channels are available for both).

As for the examples you gave, they’re not exactly great:

First, when loading models into the GPU memory, even with SSDs the disk read is by far the slowest part and hence the bottleneck, so as long as things are being done in parallel (i.e. whilst the data is loaded from disk to CPU memory, already loaded data is also being copied from CPU memory to GPU memory) you won’t see that much difference between loading to CPU memory and then from there to GPU memory and direct loading to GPU memory. Further, the manipulation of models in shared memory by the CPU introduces the very performance problems I was explaining above, namely contention problems from both devices accessing the same memory blocks and GPU cache entries getting invalidated because the CPU altered that data in the main memory.
Second, if I’m not mistaken tone mapping is highly parallelizable (as pixels are independent - I think, but not sure since I haven’t actually implemented this kind of post processing), which means that the best by far device at parallel processing - the GPU - should be handling it in a shader, not the CPU. (Mind you, I might be wrong in this specific case if the algorithm is not highly parallelizable. My own experience with doing things via CPU or via shaders running in the GPU - be it image shaders or compute shaders - is that in highly parallelizable stuff, a shader in the GPU is way, way faster than an algorithm running in the CPU).

I don’t think that direct access by the CPU to manipulate GPU data is at all a good thing (by the reasons given on top) and to get proper performance out of a shared memory setup at the very least the programming must done in a special way that tries to reduce collisions in memory access, or the whole thing must be setup by the OS like it’s done on PCs with integrated graphics, were a part of the main memory is reserved for the GPU by the OS itself when it starts and the CPU won’t touch that memory after that.

sp3ctr4l@lemmy.dbzer0.com · 1 year ago

Can you explain to me what the person you are replying to meant by ‘integrated memory on a desktop pc’?

I tried to explain why this phrase makes no sense, but apparently they didn’t like it.

…Standard GPUs and CPUs do not share a common kind of RAM that gets balanced between space reserved for CPU-ish tasks and GPU-ish tasks… that only happens with an APU that uses LPDDR RAM… which isn’t at all a standard desktop PC.

It is as you say, a hierarchy of assets being called into the DDR RAM by the CPU, then streamed or shared into the GPU and its GDDR RAM…

But the GPU and CPU are not literally, directly using the actual same physical RAM hardware as a common shared pool.

Yes, certain data is… shared… in the sense that it is or can be, to some extent, mirrored, parellelized, between two distinct kinds of RAM… but… not in the way they seem to think it works, with one RAM pool just being directly accessed by both the CPU and GPU at the same time.

… Did they mean ‘integrated graphics’ when they … said ‘integrated memory?’

L1 or L2 or L3 caches?

???

I still do not understand how any standard desktop PC has ‘integrated memory’.

What kind of ‘memory’ on a PC… is integrated into the MoBo, unremovable?

???

Aceticon@lemmy.dbzer0.com · edit-2 1 year ago

Hah, now you made me look that stuff up since I was talking anchored on my knowledge of systems with multiple CPUs and shared memory, since that was my expectation about the style of system architecture of the PS5, since in the past that’s how they did things.

So, for starters I never mentioned “integrated memory”, I wrote “integrated graphics”, i.e. the CPU chip comes together with a GPU, either as two dies in the same chip package or even both on the same die.

I think that when people talk about “integrated memory” what they mean is main memory which is soldered on the motherboard rather than coming as discrete memory modules. From the point of view of systems architecture it makes no difference, however from the point of view of electronics, soldered memory can be made to run faster because soldered connections are much closer to perfect than the mechanical contact connections you have for memory modules inserted in slots.

(Quick explanation: at very high clock frequencies the electronics side starts to behave in funny ways as the frequency of the signal travelling on the circuit board gets so high and hence the wavelength size gets so small that it’s down to centimeters or even milimeters - around the scale of the length of circuit board lines - and you start getting effects like signal reflections and interference between circuit lines - because they’re working as mini antennas so can induce effects on nearby lines - hence it’s all a lot more messy than if the thing was just running at a few MHz. Wave reflections can happen in connections which aren’t perfect, such as the mechanical contact of memory modules inserted into slots, so at higher clock speeds the signal integrity of the data travelling to and from the memory is worse than it is with soldered memory whose connections are much closer to perfect).

As far as I know nowadays L1, L2 and L3 caches are always part of the CPU/GPU die, though I vaguelly remember that in the old days (80s, 90s) memory cache might be in the form of dedicated SRAM modules on the motherboard.

As for integrated graphics, here’s some reference for an Intel SoC (system on a chip, in this case with the CPU and GPU together in the same die). If you look at page 5 you can see a nice architecture diagram. Notice how memory access goes via the memory controller (lower right, inside the System Agent block) and then the SoC Ring Interconnect which is an internal bus connecting everything to everything (so quite a lot of data channels). The GPU implementation is the whole left side, the CPU is top and there is a cache slice (at first sight an L4 cache) at the bottom shared by both.

As you see there, in integrated graphics the memory access doesn’t go via the CPU, rather there is a memory controller (and in this example a memory cache) for both and memory access for both the CPU and the GPU cores goes through that single controller and shares that cache (but lower level caches are not shared: notice how the GPU implementation contains its own L3 cache - bottom left, labelled “L3$”)

With regards to the cache dirty problems I mentioned in the previous post, at least that higher level (L4) cache is shared so instead of cache entries being made invalid because of the main memory being changed outside of it, what you get is a different performance problem were there is competiton for cache usage between the areas of memory used by the CPU and areas of memory used by the GPU (as the cache is much smaller than the actual main memory, it can only contain copies of part of the main memory, and if two devices are using different areas of the main memory they’re both causing those areas to get cached but the cache can’t fit both so depending on the usage pattern it might constantly be ejecting entries for one area of memory to make room for entries for the other area of memory and back, which in practice makes it as slow as not having any cache there - there are lots of tricks to make this less of a problem but it’s still slower than if there was just one processing device using that cache such as you get with each processing device having its own cache and its own memory).

As for contention problems, there are generally way more data channels in an internal interconnect as the one you see there than in the data bus to the main memory modules, plus that internal interconnect will be way faster, so the contention in memory access will be lower for cached memory but with cache misses (memory locations not in cache and hence that have to be loaded from main memory) that architecture will still suffer from two devices sharing the main memory hence that memory’s data channels having to be shared.

sp3ctr4l@lemmy.dbzer0.com · edit-2 1 year ago

addie said:

Integrated memory on a desktop computer is more “partitioned” than shared

Then I wrote my own reply to them, as you did.

And then I also wrote this, under your reply to them:

Can you explain to me what the person you are replying to meant by ‘integrated memory on a desktop pc’?

And now you are saying:

So, for starters I never mentioned “integrated memory”, I wrote “integrated graphics”, i.e. the CPU chip comes together with a GPU, either as two dies in the same chip package or even both on the same die.

I mean, I do genuinely appreciate your detailed, technical explanations of these systems and hardware and their inner functions…

But also, I didn’t say you said integrated memory.

I said the person you are replying to, addie, said integrated memory.

I was asking you to perhaps be able to explain what they meant… because they don’t seem to know what they’re trying to say.

But now you have misunderstood what I said, what I asked, lol.

You replied to addie … I think, as if they had written ‘integrated graphics’. But they didn’t say that. They said ‘integrated memory’.

And… unless I am … really, really missing something… standard desktop PCs… do not have any kind of integrated memory, beyond like… very, very small areas where the mobo bios is stored, but that is almost 100% irrelevant to discussion about video game rendering capabilities.

As you say, you have to go back 20+ years to find desktop PCs with Mobos that have their own SRAM… everything else is part of the GPU or CPU die, and thus … isn’t integrated. As GPUs and CPUs are removable, swappable, on standard desktop PCs.

Eitherway, again, I do appreciate your indepth technical info!

Aceticon@lemmy.dbzer0.com · edit-2 1 year ago

Well, I wasn’t sure if you meant that I did say that or if you just wanted an explanation, so I both clarified what I said and I gave an explanation to cover both possibilities :)

I think the person I was replying to just got confused when they wrote “integrated memory” since as I explained when main memory is “integrated” in systems like these, that just means it’s soldered on the motherboard, something which really makes no difference in terms of architecture.

There are processing units with integrated memory (pretty much all microcontrollers), which in means they come with their own RAM (generally both Flash Ram and SRAM) in the same integrated circuit package or even the same die, but that’s at the very opposite end of processing power of a PC or PS5 and the memory amounts involved tend to be very small (a few MB or less).

As for the “integrated graphics” bit, that’s actually the part that matters when it comes to performance of systems with dedicate CPU and GPU memory vs systems with shared memory (integrated in the motherboard or otherwise, since being soldered on the motherboard or coming as modules doesn’t really change the limitations of each architecture) which is what I was talking about back in the original post.

sp3ctr4l@lemmy.dbzer0.com · 1 year ago

Sorry, I … well, I was recently diagnosed with PTSD.

And a significant part of that… is I am so, so, very used to people just misinterpreting what I actually said, then in their heads, they heard /something else/, and then they respond to /something else/, they continue to believe I said /something else/, even after I explain to them that isn’,t what I said, and then they tell everyone else that I said /something else/.

(there are many other things that go into the PTSD, but they are waaaay outside of the scope of this discussion)

I, again, realize and aporeciate that you responded to both interpretations…

But I am just a bit triggered.

I am so, so, very used to being gaslit by… most of my family, and many, many other people in my life, who just seemingly willfully misinterpret me consistently, or are literally incapable of hearing/reading without just inventing and inserting their own interpretation.

… Whole lot of my family has very serious mental health disorders, and I’ve also happened to have a very bad run of many bosses and former friends and ex partners who just do the same thing, all the time.

Took me a long time to just… get away from all these toxic situations, and finally be able to pursue mental health evaluation/treatment on my own accord.

I’m not saying you ‘intentionally triggered me’ or anything like that, that would be a ridiculous judgement from me, and you have been very polite, and informative… I’m just trying to explain myself, lol.

As to the actual technical info: yes, everything you are saying lines up with my understanding, its nice to know I know what these words and terms mean in this context, and my understanding is … in line with reality.

Aceticon@lemmy.dbzer0.com · 1 year ago

Well, this being the Internet it’s natural to expect less than impeccable truth from strangers here, both because a lot of people just want to feel like they “won” the argument no matter what so they’ll bullshit their way into a “win”, because most people aren’t really trained in the “trying to be as completed and clear as possible” mental processes as Engineers and Scientists (so there’s a lot of “I think this might be such” being passed as “it is such”) and because it simply feels bad to be wrong so most people don’t want to accept it when somebody else proves them wrong and react badly to it.

I’m actually a trained Electronics Engineer but since I don’t actually work in that domain and studied it decades ago, some of what I wrote are informed extrapolations based on what learned and stuff I read over the years rather than me being absolutely certain that’s how things are done nowadays (which is why looking up and reading that Intel spec was very interesting, even if it turned out things are mainly is as I expected).

Also I’m sorry for triggering you, you don’t need to say sorry for your reaction and I didn’t really took it badly: as I said, this is the Internet and a lot of people are argumentative for the sake of “winning” (probably the same motivation as most gaslighters) so I expect everybody to be suspicious of my motivations, same as they would be for all other people since from their point of view I’m just another random stranger ;)

Anyways, cheers for taking the trouble of explaining it and making sure I was okay with out interaction - that’s far nicer and more considerate than most random internet strangers.