<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Confessions of a Code Addict: Live Sessions]]></title><description><![CDATA[Archive of all the live sessions for the premium subscribers and whoever purchases them]]></description><link>https://blog.codingconfessions.com/s/live-sessions</link><image><url>https://substackcdn.com/image/fetch/$s_!lstI!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe440a724-cff0-437a-8361-d7699406ac22_500x500.png</url><title>Confessions of a Code Addict: Live Sessions</title><link>https://blog.codingconfessions.com/s/live-sessions</link></image><generator>Substack</generator><lastBuildDate>Thu, 23 Apr 2026 05:51:01 GMT</lastBuildDate><atom:link href="https://blog.codingconfessions.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Abhinav Upadhyay]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[codeconfessions@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[codeconfessions@substack.com]]></itunes:email><itunes:name><![CDATA[Abhinav Upadhyay]]></itunes:name></itunes:owner><itunes:author><![CDATA[Abhinav Upadhyay]]></itunes:author><googleplay:owner><![CDATA[codeconfessions@substack.com]]></googleplay:owner><googleplay:email><![CDATA[codeconfessions@substack.com]]></googleplay:email><googleplay:author><![CDATA[Abhinav Upadhyay]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Invite your friends to read Confessions of a Code Addict]]></title><description><![CDATA[Thank you for reading Confessions of a Code Addict &#8212; your support allows me to keep doing this work.]]></description><link>https://blog.codingconfessions.com/p/invite-your-friends-to-read-confessions</link><guid isPermaLink="false">https://blog.codingconfessions.com/p/invite-your-friends-to-read-confessions</guid><dc:creator><![CDATA[Abhinav Upadhyay]]></dc:creator><pubDate>Thu, 06 Jun 2024 07:26:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!lstI!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe440a724-cff0-437a-8361-d7699406ac22_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Thank you for reading Confessions of a Code Addict &#8212; your support allows me to keep doing this work.</p><p>If you enjoy Confessions of a Code Addict, it would mean the world to me if you invited friends to subscribe and read with us. If you refer friends, you will receive benefits that give you special access to Confessions of a Code Addict.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.codingconfessions.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Confessions of a Code Addict is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>How to participate </strong></p><p><strong>1. Share Confessions of a Code Addict. </strong>When you use the referral link below, or the &#8220;Share&#8221; button on any post, you'll get credit for any new subscribers. Simply send the link in a text, email, or share it on social media with friends.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.codingconfessions.com/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.codingconfessions.com/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>2.<strong> Earn benefits.</strong> When more friends use your referral link to subscribe (free or paid), you&#8217;ll receive special benefits.</p><ul><li><p>Get a 1 month comp for 3 referrals</p></li><li><p>Get a 3 month comp for 10 referrals</p></li><li><p>Get a 6 month comp for 25 referrals</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.codingconfessions.com/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Visit the leaderboard&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.codingconfessions.com/leaderboard?&amp;utm_source=post"><span>Visit the leaderboard</span></a></p><p>To learn more, check out <a href="https://support.substack.com/hc/en-us/articles/16142857300372">Substack&#8217;s FAQ</a>.</p><p>Thank you for helping get the word out about Confessions of a Code Addict!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.codingconfessions.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Confessions of a Code Addict is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Recording: Comparing CPUs, GPUs & LPUs]]></title><description><![CDATA[Hi everyone,]]></description><link>https://blog.codingconfessions.com/p/recording-comparing-cpus-gpus-and</link><guid isPermaLink="false">https://blog.codingconfessions.com/p/recording-comparing-cpus-gpus-and</guid><dc:creator><![CDATA[Abhinav Upadhyay]]></dc:creator><pubDate>Mon, 18 Mar 2024 08:23:25 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/142714534/0713e78f4828ba02628d1667bbaf850f.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Hi everyone,</p><p>Thanks for joining for this session. Please enjoy the recording. </p><p>Following is an AI generated summary of the session (I felt it was decent enough that you could use it for a quick review).</p><h2><strong>Meeting summary for Comparing CPUs, GPUs and LPU + AMA (03/17/2024)</strong></h2><h3><strong>Quick recap</strong></h3><p>Abhinav discussed the advances and challenges in language processing units (LPUs) and their impact on larger language models. He also presented a detailed breakdown of the TSP (tensor streaming processor) hardware, explaining its unique design and functionality, and discussing the architecture and functionality of the TSP system. Role of Compiler in Distributed Systems, Error Correction Mechanism and results from Groq&#8217;s papers were discussed. The talk also included discussion of clock synchronization, data flow, and the complexities of compiling and running machine learning (ML) code on specific hardware configurations.</p><div><hr></div><h3><strong>Summary</strong></h3><h4><strong>Language Processing Units and Large Language Models</strong></h4><p>Abhinav discussed the advances and challenges in language processing units (LPUs) and their impact on large language models. He highlighted the achievements of a company that broke all standards for intervention on large language models. He further explained the architecture and functioning of the LPU, emphasizing its predictability and stability. Abhinav ended the conversation by encouraging questions and discussions on the topic.</p><h4><strong>CPU and GPU: Limitations and Fixes</strong></h4><p>Abhinav discussed the limitations of using CPUs and GPUs, emphasizing that control over instruction scheduling and execution lies with the hardware, not the compiler. They highlighted issues of instruction latency, non-deterministic architecture, and inability to guarantee program execution time. Charles questioned the relevance of determinism in achieving high resource utilization, to which Abhinav clarified that it is necessary for large-scale distributed systems to avoid any sources of non-determinism. </p><h4><strong>TSP hardware and throughput discussion</strong></h4><p>Abhinav presented a detailed breakdown of the TSP (Tensor Streaming Processor) hardware, explaining its unique design and functionality. He highlighted TSP's ability to perform vector operations and matrix operations, and how it works in 'slices' for efficient data processing. Charles expressed surprise at the significant improvement in throughput, with Abhinav explaining that TSP can achieve up to three times the throughput of other processors. He also discussed the potential complications and capital expenditure required to implement this technology.</p><h4><strong>TSP System Architecture and Efficiency Discussion</strong></h4><p>Abhinav and Charles discussed the architecture and functionality of the TSP system, focusing on its computational efficiency, potential for improvement, and data flow. He explored the complexities of writing programs for a distributed system and investigated the use of FP2 for arithmetic operations. The conversation also touched on the challenges of clock synchronization in a distributed system and the cost implications of widespread use of SRAM, leading to questions about the potential benefits of using DRAM.</p><h4><strong>Clock speed, synchronization, and HJC counter</strong></h4><p>Charles, Abhinav and Sirish discussed the clock speed and synchronization of a typical system, with Abhinav explaining the lower power consumption due to the slower clock speed. They also explored the concept of HAC counters to solve the problem of synchronization between TSPs. Abhinav elaborated on the process related to hardware and system instructions, focusing on the alignment and execution of two TSPs, and the role of a counter and periodic comparison of the TSP values. Charles expressed his appreciation for the detailed explanation, while acknowledging his limited knowledge on the subject.</p><h4><strong>Role of Compiler in Distributed Systems</strong></h4><p>Abhinav discussed the important role of the compiler in distributed systems, emphasizing its function in managing data flow, preventing issues such as back pressure, and optimizing resource utilization. He highlighted that the compiler efficiently distributes tasks, anticipates data transfer issues, and schedules data flows to meet demand. Additionally, he touched on the complexities of networked systems, focusing on data encoding and the need for strategies to handle potential failures, although he noted that these strategies can introduce non-determinism into data flows.</p><h4><strong>Error correction mechanisms and task scheduling</strong></h4><p>Abhinav discussed the error correction mechanism in LPUs. He explained that the system uses a single bit error correction technology, which can handle errors in data transmission. He also mentioned that if an error occurs, the system switches to a standby node. Aadhaar asked about scheduling of tasks and Abhinav clarified that everything is scheduled in advance and runs in parallel, with steps only executing when the previous ones are finished. He also discussed the use of parity bits for error detection.</p><h4><strong>Compiling and Executing Machine Learning Code</strong></h4><p>Abhinav explained the process of compiling and executing programs using existing models written in Tensorflow/PyTorch as an example, discussing how the program is decomposed into smaller tasks assigned to different TSPs and executed sequentially. Charles raised concerns about dynamic behavior at the batch level and its impact on throughput and latency in the language model (LM). They discussed the unique challenges of compiling and running machine learning (ML) code on specific hardware configurations. Charles explained that unlike on GPUs, program execution on LPU requires precise knowledge of the hardware setup. The conversation concluded that the compilation phase is tightly coupled to the hardware, and any changes to the configuration of the system would require a new compilation.</p><h4><strong>Discussion on Results from Groq&#8217;s LPU Paper and Challenges</strong></h4><p>Abhinav discussed the results from Groq&#8217; LPU paper, focusing on the performance and resource utilization of distributed matrix multiplication. He said that resource usage remained stable, unlike other hardware which showed fluctuations. He also discussed the challenges of training and maintaining data in the system, highlighting issues with data storage and overhead. Finally, he noted the cost-effectiveness and power efficiency of the system.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://docs.google.com/presentation/d/1p10CiXb7p4tNDV4tzB7h3lP--Xjwvp6Sr67mik05Ex0/edit?usp=sharing&quot;,&quot;text&quot;:&quot;Slides&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://docs.google.com/presentation/d/1p10CiXb7p4tNDV4tzB7h3lP--Xjwvp6Sr67mik05Ex0/edit?usp=sharing"><span>Slides</span></a></p><p></p><p>If you have any questions, feel free to reach out to me.</p>]]></content:encoded></item><item><title><![CDATA[Recording: Live Session on Performance Optimization Using 1BRC as a Case Study]]></title><description><![CDATA[Hi, Thank you signing up for this live session and if you attended it live, then thank you for that as well.]]></description><link>https://blog.codingconfessions.com/p/recording-live-session-on-performance-ae9</link><guid isPermaLink="false">https://blog.codingconfessions.com/p/recording-live-session-on-performance-ae9</guid><dc:creator><![CDATA[Abhinav Upadhyay]]></dc:creator><pubDate>Mon, 19 Feb 2024 16:45:30 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/141822745/045ae3efbb21842755c078a874f4853a.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Hi,</p><p>Thank you signing up for this live session and if you attended it live, then thank you for that as well.</p><p>This post provides you the recording of the session. I hope you find it useful. If you have any questions feel free to reach out to me.</p><p></p>]]></content:encoded></item><item><title><![CDATA[Recording: Performance Engineering Techniques Behind 1BRC]]></title><description><![CDATA[This is the recording of the live session covering some of the performance engineering techniques behind problems like 1BRC.]]></description><link>https://blog.codingconfessions.com/p/recording-performance-engineering</link><guid isPermaLink="false">https://blog.codingconfessions.com/p/recording-performance-engineering</guid><dc:creator><![CDATA[Abhinav Upadhyay]]></dc:creator><pubDate>Mon, 29 Jan 2024 11:18:47 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/141150845/8897ef8d08c495ba6f326a078f82e012.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>This is the recording of the live session covering some of the performance engineering techniques behind problems like 1BRC. We tried to cover a lot of ground:</p><ul><li><p>Background on X86 ISA and some assembly code patterns</p></li><li><p>Compiler level optimizations</p></li><li><p>Microarchitecture level features of X86</p><ul><li><p>Instruction level parallelism</p></li><li><p>Branch prediction</p></li><li><p>Caches</p></li></ul></li><li><p>Finally, we went through a 1BRC implementation. Where we went through different improved versions of the implementation, each version built on the previous one by fixing one of the performance bottlenecks identified from the flamegraph. </p></li></ul><p>There was a lot more to cover with very limited time. I would have loved to spend more time on 1BRC code, or even cover some more optimization techniques. </p><h2>Resources</h2><p>You can access the slides <a href="https://docs.google.com/presentation/d/1aeQo1nYloeMn0nSsqxqd_ovJuewABVBYUfSXONbugHU/edit?usp=sharing">here</a>, and the 1BRC code that we discussed <a href="https://github.com/abhinav-upadhyay/1BRC_Workshop">here</a>.</p><h2></h2>]]></content:encoded></item></channel></rss>