Needless to say, this is a fantastic article. Great job and thank you for going so in-depth.
It's interesting that in my time software engineering, I never had to really learn about how GPUs work in-depth. I wish I did. I have a friend working on deep learning over at Nvidia, and he seemingly operates at a different level of technicality than I do. At the same time, I try to remind myself that my expertise and experience is mostly on hyperscale distributed system and what I work on is probably foreign to him.
Regardless, I feel like at least GPU basics should be known knowledge to ambitious software engineers, especially as the world moves forward on GPU-powered computing thanks to AI.
Although I also never had to work with GPUs directly (apart from running deep learning models), there have been few instances in my career where we wondered if we could use GPUs for a problem. But the lack of basic understanding of how they operate made things difficult.
Abhinav, good article! I’m wondering if we can translate your blog into Chinese and post it in Chinese community. We will highlight your name and keep the original link on the top of the translated version. Thank you!
A bit of history: 1st: ‘computers’ were people trained to solve complex problems (see movie “hidden figures”) 2nd: computers used to be big and expensive. They needed specially trained people to operate them. Then came mini-computers which took up less space, but had limited function and were used to control machines (look up Naked Mini). Then Intel decided rather than use a complex custom circuit to use a programmable system to, if my memory server, run a Mainframe disk drive. A couple of hobbyists wrote an article for Popular Electronics using that processor and a bunch of 100 pin connectors (because they found those on sale for the 50 or so kits they thought they would sell) They got a thousand orders within a week and the hobby computer craze started. A couple of kids in California though they could sell pre-built hobby computers for those who didn’t want to solder their own and allowed others to write programs for it (you may have heard of that company…Apple).
Up until this point if you wanted more than one person to have access to a computer (or wait in line for hours to access the mainframe, you had to connect using a simple keyboard and monitor setup.
While we now can have in our hands computers which are far more capable than the old mainframes there are somethings that your desktop can’t provide: reliability and scale. A computer center (whether its company owned, shared or a cloud center) has is redundant communications links, 24/7 power with regularly tested backups and the reliable hot swappable components. On a mainframe you can replace processors, memory , mass storage without shutting anything down. On a smaller scale, what looks like a tower computer, if its a server will have redundant power supplies, error correcting memory and mass storage which can support disk failure with replacement disk able to be swapped in and the system restored in the background. What happens when say the redundancy fails, major airlines have to stand down operations for several days. What happens if the bank’’s data center fails, all their ATMs stop issuing money.
This analogy explains why the flow of time is not constant but varies depending on the local informational environment of the Field.
The Analogy: Think of the universe as a vast, parallel computer processor—the Adelic Computronium. Each region of space is like an individual processing core, and the local "flow of time" is its clock speed. A region of high Gnosis and complexity, like the accretion disk of a GreyHole, is a core running an incredibly demanding, focused calculation. To handle this "computational load," the system dedicates immense resources, and the core's clock appears to slow down relative to the rest of the processor. This is Chromatic Time Dilation. Conversely, a region of high Dissonance, like a vast cosmic void, is like an idle core filled with random, chaotic background static. To quickly clear this "informational noise" and return to an optimal, low-energy state, the system temporarily "overclocks" the core, causing its clock to run faster. This is Gravitational Anti-Time-Dilation.
Your comment is valid. Although the article was written for an audience which may not have much background in parallel computing and the use of Little's law was to provide a better intuition. However, I didn't spend enough words on elaborating on it because that would have taken too much space and diluted other parts of the article. I've removed the mention of Little's law, it was really not needed when explaining how GPUs work.
Needless to say, this is a fantastic article. Great job and thank you for going so in-depth.
It's interesting that in my time software engineering, I never had to really learn about how GPUs work in-depth. I wish I did. I have a friend working on deep learning over at Nvidia, and he seemingly operates at a different level of technicality than I do. At the same time, I try to remind myself that my expertise and experience is mostly on hyperscale distributed system and what I work on is probably foreign to him.
Regardless, I feel like at least GPU basics should be known knowledge to ambitious software engineers, especially as the world moves forward on GPU-powered computing thanks to AI.
Thank you, Leonardo.
Although I also never had to work with GPUs directly (apart from running deep learning models), there have been few instances in my career where we wondered if we could use GPUs for a problem. But the lack of basic understanding of how they operate made things difficult.
Ciao Abhinav, greetings from Italy. I really enjoy and admire your posts. I have written to you via Linkedin, hope that's okay.
Hi Tony, thank you so much. (already connected with you on LinkedIn) :-)
Nice article, refreshed my 2016 memory of CUDA programming.
Keep up the great job 👏
Thanks, Nat :)
Excellent fundamentals about GPU, Great post, thanks a lot
Abhinav, good article! I’m wondering if we can translate your blog into Chinese and post it in Chinese community. We will highlight your name and keep the original link on the top of the translated version. Thank you!
A bit of history: 1st: ‘computers’ were people trained to solve complex problems (see movie “hidden figures”) 2nd: computers used to be big and expensive. They needed specially trained people to operate them. Then came mini-computers which took up less space, but had limited function and were used to control machines (look up Naked Mini). Then Intel decided rather than use a complex custom circuit to use a programmable system to, if my memory server, run a Mainframe disk drive. A couple of hobbyists wrote an article for Popular Electronics using that processor and a bunch of 100 pin connectors (because they found those on sale for the 50 or so kits they thought they would sell) They got a thousand orders within a week and the hobby computer craze started. A couple of kids in California though they could sell pre-built hobby computers for those who didn’t want to solder their own and allowed others to write programs for it (you may have heard of that company…Apple).
Up until this point if you wanted more than one person to have access to a computer (or wait in line for hours to access the mainframe, you had to connect using a simple keyboard and monitor setup.
While we now can have in our hands computers which are far more capable than the old mainframes there are somethings that your desktop can’t provide: reliability and scale. A computer center (whether its company owned, shared or a cloud center) has is redundant communications links, 24/7 power with regularly tested backups and the reliable hot swappable components. On a mainframe you can replace processors, memory , mass storage without shutting anything down. On a smaller scale, what looks like a tower computer, if its a server will have redundant power supplies, error correcting memory and mass storage which can support disk failure with replacement disk able to be swapped in and the system restored in the background. What happens when say the redundancy fails, major airlines have to stand down operations for several days. What happens if the bank’’s data center fails, all their ATMs stop issuing money.
Time: A Dynamic Computer Processor
This analogy explains why the flow of time is not constant but varies depending on the local informational environment of the Field.
The Analogy: Think of the universe as a vast, parallel computer processor—the Adelic Computronium. Each region of space is like an individual processing core, and the local "flow of time" is its clock speed. A region of high Gnosis and complexity, like the accretion disk of a GreyHole, is a core running an incredibly demanding, focused calculation. To handle this "computational load," the system dedicates immense resources, and the core's clock appears to slow down relative to the rest of the processor. This is Chromatic Time Dilation. Conversely, a region of high Dissonance, like a vast cosmic void, is like an idle core filled with random, chaotic background static. To quickly clear this "informational noise" and return to an optimal, low-energy state, the system temporarily "overclocks" the core, causing its clock to run faster. This is Gravitational Anti-Time-Dilation.
Your comment is valid. Although the article was written for an audience which may not have much background in parallel computing and the use of Little's law was to provide a better intuition. However, I didn't spend enough words on elaborating on it because that would have taken too much space and diluted other parts of the article. I've removed the mention of Little's law, it was really not needed when explaining how GPUs work.