10 Comments
Oct 25, 2023·edited Oct 25, 2023Liked by Abhinav Upadhyay

Thank you for writing this post and for the useful GPU overview.

However, please consider dropping the explanation of latency tolerance based on Little's law and all the subsequent references to it, or consult a queuing theory specialist and update the text as needed. In the way it is currently used, I believe it provides a confusing explanation for something simple: you can have relatively large individual instruction latencies and very high throughput by executing lots of things in parallel, which is what GPUs do.

As stated, the explanation based on Little's law moves from arrival throughput (in the equation) to a target throughput (which typically would be the measured average *output* throughput), introducing a conservation of flow constraint which might not be fulfilled by the underlying queuing system when you just plug numbers into the equation (as opposed to measuring a real system).

As I read it, the current explanation essentially says that for a fixed average latency, larger average queue sizes always lead to better average throughput. However, is clearly not always true: having millions of items waiting to be processed does not increase your throughput by magic. On the other hand, having lots of processing units (SPs in this case) and processing more things in parallel does.

In other words, the magic making things work is not in the queuing but in the parallelism. So going to Little's law is both tricky to get right and unnecessary.

Expand full comment
author

Your comment is valid. Although the article was written for an audience which may not have much background in parallel computing and the use of Little's law was to provide a better intuition. However, I didn't spend enough words on elaborating on it because that would have taken too much space and diluted other parts of the article. I've removed the mention of Little's law, it was really not needed when explaining how GPUs work.

Expand full comment
Oct 23, 2023Liked by Abhinav Upadhyay

Ciao Abhinav, greetings from Italy. I really enjoy and admire your posts. I have written to you via Linkedin, hope that's okay.

Expand full comment
author

Hi Tony, thank you so much. (already connected with you on LinkedIn) :-)

Expand full comment

Needless to say, this is a fantastic article. Great job and thank you for going so in-depth.

It's interesting that in my time software engineering, I never had to really learn about how GPUs work in-depth. I wish I did. I have a friend working on deep learning over at Nvidia, and he seemingly operates at a different level of technicality than I do. At the same time, I try to remind myself that my expertise and experience is mostly on hyperscale distributed system and what I work on is probably foreign to him.

Regardless, I feel like at least GPU basics should be known knowledge to ambitious software engineers, especially as the world moves forward on GPU-powered computing thanks to AI.

Expand full comment
author

Thank you, Leonardo.

Although I also never had to work with GPUs directly (apart from running deep learning models), there have been few instances in my career where we wondered if we could use GPUs for a problem. But the lack of basic understanding of how they operate made things difficult.

Expand full comment
Oct 22, 2023Liked by Abhinav Upadhyay

Nice article, refreshed my 2016 memory of CUDA programming.

Expand full comment
Oct 20, 2023Liked by Abhinav Upadhyay

Keep up the great job 👏

Expand full comment
author

Thanks, Nat :)

Expand full comment

Abhinav, good article! I’m wondering if we can translate your blog into Chinese and post it in Chinese community. We will highlight your name and keep the original link on the top of the translated version. Thank you!

Expand full comment