Thursday, May 25, 2023

Andrej Karpathy's Explanations of GPT

For deeply understanding GPT models like the ones from OpenAI, I think there is no one who explains it better than Andrej Karpathy. Andrej Karpathy was part of the original group at OpenAI, then left to become director of AI at Telsa, but since February he is back at OpenAI. He also makes YouTube videos explaining the underlying technology. 

I want to point out two recent videos that he did that if you want to really understand how something like ChatGPT works then these are essential.

First, is the talk he just did at Microsoft Build. There were some important announcements at MS Build 2023 regarding AI and I encourage you to check them out, but the talk by Andrej even though it wasn’t an announcement talk should be really valuable for anyone using a GPT based tool. At a high level, he explains how GPT works and then in the second part of his talk he explains why different types of prompts work.


I generally have a problem using the term “prompt engineering” as it’s not engineering and getting what you want from an AI is often just common sense.  But admittedly it does involve understanding how GPT works as an AI assistant versus communicating to a human assistant.  Andrej explains prompt strategies like ‘chain of thought”, using an ensemble of multiple attempts, reflection, and simulating how humans reason towards an optimal solution. He also talks a little bit about AutoGPT, the hype surrounding it but that it’s “inspirationally interesting.” He also mentions a paper that just came out that talks about using a tree search in prompting called “Tree of Thought.”

The second video is from Andrej’s YouTube channel. It is called “Let's build GPT: from scratch, in code, spelled out.” He has other videos on his site called the “Makemore” series that are also really good, but this one is THE BEST explanation of GPT of how a transformer/attention based model like GPT works. All of the models today are what they are because of the seminal paper put out by Google called “Attention Is All You Need” and also because of other refinements like reinforcement learning.

But if you want to understand how these models actually work and how they are built but found other explanations too general or found diagrams like this baffling or unsatisfactory then this is the explanation.



Especially if the way you understand something like this is to see actual working code being built up. Understanding his video does take knowledge of Python and some knowledge of PyTorch. But even if you haven’t done much in PyTorch, you can follow along. Plus this is the perfect opportunity to build something in PyTorch if you haven’t before. His explanations are extremely clear. He goes through step by step building a simple GPT model which you can do regardless of how powerful or not powerful your machine is or what your access to a GPU might be.

So the first talk I’m recommending is more at a high level of understanding of GPT, but the second talk is more technical if you want a deeper understanding of the engineering. Both talks are excellent and I know it’s a bold statement because there are a lot of people who also are really good at this, but I think besides being a brilliant engineer Andrej Karpathy is the best at educating people right now in AI.

No comments:

Post a Comment

"Superhuman" Forecasting?

This just came out from the Center for AI Safety  called Superhuman Automated Forecasting . This is very exciting to me, because I've be...