No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla - https://www.youtube.com/watch?v=hM_h0UA7upI Hi, listeners. Welcome back to no priors. Today we're hanging out with Andre Karpathy, who needs no introduction. Andre is a renowned researcher, beloved AI educator, and Cuber, an early team member from OpenAI, the lead for Autopilot at Tesla, and now working on AI for education. We'll talk to him about the state of research, his new company, and what we can expect from AI. Thanks a lot for joining us today. It's great to have you here. Thank you. I'm happy to be here. You led Autopilot at Tesla, and now, like, we actually have fully self driving cars, passenger vehicles on the road. How do you read that in terms of where we are in the capability set, how quickly we should see increased capability or pervasive passenger vehicles? Yes, I spent maybe five years in self driving space. I think it's fascinating space. And basically what's happening in the field right now is, well, I do also think that I draw a lot of analogies, I would say, to AGI from self driving, and maybe that's just because I'm familiar with it, but I kind of feel like we've reached AGI a little bit in self driving, because there are systems today that you can basically take around, and as a paying customer, can take around here. So Waymo in San Francisco here is, of course, very common. Probably you've taken Waymo. I've taken it a bunch, and it's amazing. And it can drive you all over the place, and you're paying for it as a product. What's interesting with Waymo is the first time I took Waymo was actually a decade ago, almost exactly 2014 or so. And it was a friend of mine who worked there, and he gave me a demo, and it drove me around the block ten years ago, and it was basically perfect drive ten years ago. And it took ten years to go from, like, a demo that I had to a product I can pay for that's in the city scale and is expanding, et cetera. How much of that do you think was regulatory versus technology? Like, when do you think the technology was ready? Is it. I think it's technology. You're just not seeing it in a single demo drive of 30 minutes. You're not running into all the stuff that THEy had to deal with for a decade. And so demo and product, there's a massive gap there, and I think a lot of it also regulatory, et cetera. But I do think that we've sort of achieved AGI in the self driving space in that sense, a little bit. And yet I think what's really fascinating about it is the globalization hasn't happened at all. So you have a demo and you can take it in a stuff, but the world hasn't changed yet. And that's going to take a long time, going from a demo to an actual globalization of it, I think there's a big gap there. That's how it's related, I would say, to AGI, because I suspect similar. It looks in a similar way for AGI when we sort of get it and then staying for a minute in the self driving space. I think people think that Waymo is ahead of Tesla. I think personally, Tesla is ahead of Waymo. And I know it doesn't look like that, but I'm still very bullish on Tesla and its self driving program. I think that Tesla has a software problem, and I think Waymo has a hardware problem, is the way I put it. And I think software problems are much easier. Tesla has the deployment of all these cars on earth, like at scale. And I think Waymo needs to get there. And so the moment Tesla sort of, like, gets to the point where they can actually deploy this and it actually works, I think it's going to be really incredible. The latest builds I just drove yesterday, I mean, it's just driving me all over the place now. They've made, like, really good improvements, I would say very recently. Yeah, I've been using it a lot recently, and it actually works quite well. It did some miraculous driving for me yesterday. So I'm very impressed with what the team is doing. And so I still think that Tesla mostly has a software problem, way more mostly hardware problem. And so I think Tesla Waymo looks like it's winning kind of right now. But I think when we look in ten years and who's actually at scale and where most of the revenue is coming from, I still think they're ahead in that sense. How far away do you think we are from the software problem, turning the corner in terms of getting to some equivalency? Because obviously, to your point, if you look at a Waymo car, it has a lot of very expensive lidar and other sort of sensors built into the car, so it can do what it does. It sort of helps support the software system. And so if you can just use cameras, which is the Tesla approach, then you effectively get rid of enormous cost complexity, and you can do it in many different types of cars. When do you think that transition happens? I mean, in the next few years? I mean, I'm hoping, you know, like something like that, but actually what's really interesting about that is I'm not sure that people are appreciating that. TeSla actually does use a lot of expensive sensors. They just do it at training time. So there are a bunch of cars that drive around with lidars. They do a bunch of stuff that, like, doesn't scale, and they have extra sensors, et cetera, and they do mapping and all this stuff. You're doing it at training time, and then you're distilling that into a test time package that gets deployed to the cars and is vision only. And it's like an arbitrage on sensors and expense. And so I think it's actually kind of a brilliant strategy that I don't think is fully appreciated. And I think it's going to work out well because the pixels have the information, and I think the network will be capable of doing that. And yes, at training time, I think these sensors are really useful, but I don't think they're as useful at test time. And I think you can. It seems like the one other thing or transition that's happened is basically a move from a lot of sort of edge case designed heuristics associated with it versus end to end deep learning. And that's what other shift that's happened recently. Do you want to talk a little bit about that and sort of what that? Yeah, I think that was always like the plan from the start, I would say at Tesla, as I was talking about how the neural net can eat through the stack, because when I joined, there was a ton of c code, and now there's much, much less c code in the test time package that runs in the car because there's still a ton of stuff in the backend that we're not talking about. The neural net takes through the system. So first it just does a detection on the image level, then it does multiple images, gives you prediction, then multiple images over time give you a prediction, and you're discarding c code, and eventually you're just giving steering command. And so I think Tesla is kind of eating through the stack. My understanding is that current waymos are actually, like, not that, but that they've tried, but they ended up, like, not doing that is my current understanding. But I'm not sure because they don't talk about it. But I do fundamentally believe in this approach. And I think that's the last piece to fall if you want to think about it that way. And I do suspect that the end to end systems for Tesla in, like, say, ten years, it is just a neural net. I mean, the videos stream into a neural net and commands come out. You have to sort of build up to it incrementally and do it piece by piece. And even all the intermediate predictions and all these things that we've done, I don't think they've actually misled development. I think they're part of it, because there's a lot of solid reasons for this. So actually end to end driving, when you're just imitating humans and so on, you have very few bits of supervision to train a massive neural net. And it's too few bits of signal to train so many billions of parameters. And so these intermediate representations and so on help you develop the features and the detectors for everything. And then it makes it much easier problem for the end to end part of it. And so I suspect, although I don't know, because I'm not part of the team, but there's a ton of pre training happening so that you can do the fine tuning for end to end. And so basically, I feel like it was necessary to eat through it INCrementALLY. And that's what Tesla has done, I think, is the right approach, and it looks like it's working. So I'm really looking forward. If you had started end to end, you wouldn't have HAD the data anyway. That makes sense. Yeah. So you worked on the Tesla humanoid robot before you left. I have so many questions, but one is, like, starting here. What transfers? Basically, everything transfers, and I don't think people appreciate it. OKAY. That's a big claim. I think it's like a very different problem. It's basically robots. When you actually look at it, cars are robots. And Tesla, I don't think, is a car company. I think this is misleading. This is robotics company. Robotics at scale company, because I would say at scale is also like a whole separate variable. They're not building a single thing. They're building the machine that builds the thing, which is a whole separate thing. And so I think robotics at scale company is what Tesla is. And I think in terms of the transfer from cars to humanoids, it was not that much work at all. And in fact, like the early versions of Optimus, the robot, it thought it was a car because it had the exact same computer, it had the exact same cameras. It was really funny because we were running the car networks on the robot, but it's walking around the office and so on. Oh, nice. And it's trying to recognize drivable space, but it's all just walking space now, I suppose. But it actually kind of generalized a little bit, and there's some. Some fine tuning necessary and so on. But it thought it was driving, but it's actually like, moving through an environment is a reasonable way to think of this as, like, actually it's a robot. Many things transfer, but you're just missing, for example, actuation and action data. Yeah, you definitely miss some components. And the other part, I would say, is, like, so much transfers, like, the speed with which Optimus was started, I think, to me, was very impressive, because the moment Elon said, we're doing this, just people just showed up with all the right tools, and the stuff just showed up so quickly and all these cad models and all the supply chain stuff, and I just felt like, wow, there's so much has expertise for building robotics at Tesla, and it's all the same tools, and they're just like, okay, they're being reconfigured from a car, like a transformer, the movie, they're just being reconfigured and reshuffled, but it's like the same thing, and you need all the same components. You need to think about all the same kinds of stuff, both on the hardware side, on the scale stuff, and also on the brains. And so for the brains, there was also a ton of transfer, not just of the specific networks, but also all of the approach and the labeling team and how it all coordinates and the approaches people are taking. I just think there's a ton of transfer. What do you think of the first application areas for humanoid robotics or human form stuff? I think a lot of people have this vision of it, like doing your laundry, et cetera. I think that will come late. I don't think B, two C should be the right start point, because I don't think we can have a robot like crush Grandma is how I put it, sort of. I think it's, like too much legal liability. It's just like, I don't Have a very porky hug. I was just going to fall over or something like that. These things are not perfect yet, and they require some amount of work. So I think the best customer is yourself first. And I think PRoBablY tesla is going to do this. I'm very bullish on Tesla. If people can tell the first customer is yourself, and you incubate it in the factory and so on, doing maybe a lot of material handling, etcetera. This way you don't have to create contracts working with third parties, it's all really heavy. There's lawyers involved, like, et cetera. You incubate it then you go, I think B two b second, and you go to other companies that have massive warehouses. We can do material handling. We're going to do all this stuff, contrast, get drafted up, fences, get put around, all this kind of stuff. And then once you incubate it in multiple companies, I think that's when you start to go into the b two c applications. I do think we'll see b two c robots also like unit tree and so on, are starting to come up with robots that I really want. I got one. You did? Yeah. Okay. Yeah. The g one. Yeah. So I will probably buy one of those. And there's probably going to be an ecosystem of people building on those platforms too. But I think in terms of what wins at scale, I would expect that kind of a approach. But in the beginning, it's a lot of material handling and then going towards more and more HKC things that are more specific. One that I'm really excited about is the net freedman challenge of the leaf blower. I would love for an optimist to walk down the street, tiptoe down the street, and pick up individual leaves so that we don't need leaf blowers. And I think this will work, and it's an amazing task. And so I would hope that that's one of the first applications I just. Even raking. Yeah, that should work too. Just very quietly. Yeah, just quiet raking. That's cute. I mean, they do actually have a machine that's working. It's just not a humanoid. Can we talk about the humanoid thesis for a second? The simplest version of this is like the world is built for humans, and you build one set of hardware. The right thing to do is build a model that can do an increasing set of tasks in this set of hardware. I think there's another camp that believes, well, humans are not optimal for any given task. You can make them stronger or bigger or smaller or whatever, and why shouldn't we do superhuman things? How do you think about this? I think people are maybe underappreciating the complexity of any fixed cost that goes into any single platform. I think there's a large fixed cost. You're paying for any single platform. And so I think it makes a lot of sense to centralize that and have a single platform that can do all the things. I would say the humanoid aspect is also very appealing because people can tell operate it very easily. And so it's a data collection thing that is extremely helpful because people will be able to obviously very easily tell operate it. I think that's usually overlooked. There's, of course, the aspect you mentioned, which is like the world designed for humans, et cetera. So I think that's also important. I mean, I think we'll have some variations on the humanoid platform, but I think there is a large fixed cost training platform. And then I would say also one last dimension of it is you benefit a ton from the transfer, learning between the different tasks. And in AI, you really want the single neural net that is multitasking, doing lots of things. That's where you're getting all the intelligence and the capability from. And that's also why language models are so interesting, is because you have a single regime, like a text domain, multitasking, all these different problems, and they're all sharing knowledge between each other, and it's all coupled in a single neural net. And I think you want that kind of a platform, and you want all the data you collect for leaf picking to benefit all the other tasks. If you're building a special purpose thing for any one thing, you're not going to benefit from a lot of the transferring between all the other tasks, if that makes sense. Yeah, I think there's one argument of the g. One is 30 grand, but it seems hard to build a very capable humanoid robot under a certain bomb. And if you wanted to put an arm on wheels that can do things like, maybe there are cheaper approaches to a general platform at the beginning. Does that make sense to you? Cheaper approaches to a general platform from a hardware perspective? Yeah, I think that makes sense. Yeah. You put a wheel on it instead of a feed, et cetera. I do feel like, I wonder if it's taking down like a local minimum a little bit. I just feel like pick a platform, make it perfect is like the long term pretty good bet. And then the other thing, of course, is I just think it will be kind of familiar to people, and I think people will understand that maybe you want to talk to it. And I feel like the psychological aspect also of it, I think favors possibly the human platform, unless people are, like, scared of it and would actually prefer a platform that is more abstract of like some. But then I don't know if this is real monster doing stuff, then I don't know if that's, like, more. It's interesting that I think that the other form factor for the unit tree is a dog. Right. And it's almost a more friendlier, familiar. Yeah. But then people watch black mirror, and suddenly the dog flips to, like, a scary thing, like, so it's hard to think through I just think psychologically, it will be easy for people to understand what's happening. What do you think is missing in terms of technological milestones for progress. Relative to substantiating this future for robotics? For robotics, yeah. Or the humanoid robot or anything else human form? Yeah. I don't know that I have a really good window into it. I do think that it is kind of interesting that in a humanoid form factor, for example, for the lower body. I don't know that you want to do imitation, learning from demonstration. Because for lower body, it's all a lot of inverted pendulum control and stuff like that. It's for the upper body that you need a lot of teleoperation and data collection. And end to end and et cetera. And so I think everything becomes, like, very hybrid in that sense. And I don't know how those systems interact. When I talk to people working, they feel a lot of what they focus on is actuation and manipulation. And sort of digital manipulation and things like that. Yeah. I do expect in the beginning, it's a lot of, like, teleoperation. For getting stuff off the ground and imitating it. And getting something that works 95% of the time. And then talking about human to robot ratios. And gradually having people who are supervisors of robots. Instead of doing the task directly. And all this kind of stuff is going to happen over time. And pretty gradually. I don't know that there's any individual impediments. That I'm really familiar with. I just think it's a lot of grunt work. A lot of the tools are available. Transformers are this beautiful blob of tissue. You can just get just arbitrary tasks. And you just need the data you need to put in the right form. You need to train it. You need to experiment with it. You need to deploy it, iterate on it. Just a lot of groundwork. I don't know that I have a single individual thing. That is holding us back, technically. Where are we? In the state of large blob research. Large blob research, yeah, we're in a really good state. So I think. I'm not sure if it's fully appreciated. But the transformer is much more amazing. It's not just another neural net. It's an amazing neural net, extremely general. So, for example, when people talk about the scaling loss in neural networks. The scaling loss are actually to a large extent of property of the transformer. Before the transformer, people are playing with lstms and stacking them, et cetera. You don't actually get, like, clean scaling loss. And this thing doesn't actually train and doesn't actually work. It's the transformer that was the first thing that actually just kind of like scales and you get scaling loss and everything makes sense. So it's just like general purpose training computer. I think of it as kind of a computer, but it's like a differentiable computer and you can just give it inputs and outputs and billions of it, and you can train with backpropagation. It actually kind of like arranges itself into a thing that does the task. And so I think it's actually kind of like a magical thing that we've stumbled on in the algorithm space. And I think there's a few individual innovations that went into it. So you have the residual connections that was a piece that existed. You have the layer normalizations that needs to slot in, you have the attention block, you have the lack of these saturating non linearities like tanhs and so on. Those are not present in the transformer because they kill gradient signals. So there's a few, like there's four or five innovations that all existed and were put together into this transformer, and that's what Google did with their paper. And this thing actually trains and suddenly you get scaling loss and suddenly you have this piece of tissue that just trains to a very large extent. And so it was a major unlock. You feel like we are not near the limit of that unlock, right? Because I think there is a discussion, of course, the data wall and how expensive another generation of scale would be. How do you think about that? That's where you start to get into. I don't think that the neural network architecture is holding us back fundamentally anymore. It's not the bottleneck, whereas I think in the previous, before transformer, it was a bottleneck, but now it's not the bottleneck. Now we're talking lot more about what is the loss function, where's the dataset? We're talking a lot more about those. And those have become the bottlenecks almost. It's not the general piece of tissue that reconfigures based on whatever you want it to be. And so that's where I think a lot of the activity has moved, and that's why a lot of the companies and so on, who are applying this technology, like they're not thinking about the transformer march, they're not thinking about the architecture. The llama release, like the transformer hasn't changed that much. We've added the rope positional and the rope route, the positional encodings that's like the major change. Everything else doesn't really matter too much. It's like plus 3% on a small few things, but really, it's like rope is the only thing that's slotted in, and that's the transformer as it has changed since the last five years or something. So there hasn't been that much innovation on that. Everyone just takes it for granted, let's train it, et cetera. And then everyone's just innovating on the data set mostly, and the loss function details. So that's where all the activity has gone to. Right. But what about the argument, like, in that domain, that that was easier when we were taking Internet data and we're out of Internet data, and so the question are really around, like, synthetic data or more expensive data collection? So I think that's a good point. So that's where a lot of the activity is now in LLMs. So the Internet data is like, not the data you want for your transformer. It's like a nearest neighbor that actually gets you really far, surprisingly. But the Internet data is a bunch of Internet web pages, right? It's just like what you want is the inner thought monologue of your brain. Yeah, that's the idea. The trajectories in your brain. The trajectories in your brain as you're doing problem solving. If we had a billion of that, like AGI is here, roughly speaking, I mean, to a very large extent, and we just don't have that. So where a lot of activity is now, I think, is we have the Internet data that actually gets you really close, because it just so happens that Internet has enough of reasoning traces in it and a bunch of knowledge, and the transformer just makes it work. Okay, so I think a lot of activity now is around refactoring the data set into these inner monologue formats. And I think there's a ton of synthetic data generation that's helpful for that. So what's interesting about that also is, like, the extent to which the current models are helping us create the next generation of models. And so it's kind of like the staircase of how much do you think synthetic data is? Or how far does that get us? Right? Because to your point, on each data, each model helps you train the subsequent model better, or at least create tools for it. Data labeling, whatever may be part of it, is synthetic data. How important do you think the synthetic data piece is? Because when I talk to people, I think this is the only way we can make progress, is we have to make it work? I think with synthetic data, you just have to be careful because these models are silently collapsed is, like, one of the major issues. So if you go to chatsupt and you ask it to give you a joke, you'll notice that it only knows, like, three jokes. That's, like, the only. It gives you, like, one joke, I think, most of the time. And sometimes it gives you, like, three jokes, and it's because the models are collapsed and it's silent. So when you're looking at any single individual output, you're just seeing a single example. But when you actually look at the distribution, you'll notice that it's not a very diverse distribution. It's silently collapsed. When you're doing synthetic data generation, this is a problem because you actually really want that entropy. You want the diversity and the richness in your data set. Otherwise, you're getting collapsed data sets, and you can't see it when you look at any individual. But the distribution has lost a ton of entropy and richness, and so it silently gets worse. And so that's why you have to be very careful, and you have to make sure that you maintain your entropy in your dataset, and there's a ton of techniques for that. As an example, someone released this Persona dataset. As an example, the Persona dataset is a dataset of 1 billion personalities. Like humans, like backgrounds. Oh, yes, I saw this. Yeah, I'm a teacher or I'm an artist. I live here, I do this, et cetera, and it's like little paragraphs of fictitious human background. And what you do when you do synthetic data generation is not only complete this task and do it in this way, but also imagine you're describing it to this person, and you put in this information, and now you're forcing it to explore more of the space and you're getting some entropy. So I think you have to be just very careful to inject the entropy, maintain the distribution. And that's the hard part, that I think maybe people aren't sufficiently appreciating as much in general. So I think, basically, synthetic data, absolutely. The future we're not going to run out of data, is my impression. I just think you have to be careful. What do you think we are learning now about human cognition from this research? I don't know if we're learning. One could argue that figuring out the shape of reasoning traces we want, for example, is instructive to actually understand how the brain works. I would be careful with those analogies, but in general, I do think that it's a very different kind of thing. But I do think that there are some analogies you can draw. So, as an example, I think transformers are actually better than the human brain in a bunch of ways. I think they're actually a lot more efficient system. And the reason they don't work as good as the human brain is mostly data issue, roughly speaking, is the first order approximation, I would say. And actually, as an example, transformer. Memorizing sequences is so much better than humans. If you give it a sequence and you do a single forward backward pass in that sequence, then if you give it the first few elements, it will complete the rest of the sequence. It memorized that sequence, and it's so good at it. If you gave a human a single presentation of a sequence, there's no way that you can remember that. And so the transformers, actually, I do think there's a good chance that the gradient based optimization, the forward backward update that we do all the time for training neural nets, is actually more efficient than the brain in some ways. And these models are better. They're just not yet ready to shine. But in a bunch of cognitive sort of aspects, I think they might come out with the right inputs. They will be better. That's generically true of computers for all sorts of applications. Right? Putting memory to your point. Yeah, exactly. And I think human brains just have a lot of constraints. The working memory is very small. I think transformers have a lot bigger working memory, and this will continue to be the case. They are much more efficient learners. The human brains function under all kinds of constraints. It's not obvious that human ranges is backpropagation. It's not obvious how that would work. It's very stochastic, dynamic system. It has all these constraints. It works under so ambient conditions, et cetera. So I do think that what we have is actually potentially better than the brain, and it's just not there yet. How do you think about human augmentation with different AI systems over time? Do you think that's a likely direction? Do you think that's unlikely? Augmentation, augmentation of people with AI models? Oh, of course. I mean, but in what sense? Maybe. I think in general, absolutely. Because, I mean, there's the abstract version of it you're using as a tool. That's the external version. There's the merger scenario. A lot of people end up talking about. I mean, we're already kind of merging. The thing is like, there's the I O bottleneck, but for the most part, at your fingertips. If you have any of these models. Yeah, but that's a little bit different, because, I mean, people have been making that argument for, I think, 40, 50 years where technological tools are just extension of human capabilities. Right. Yeah. The computer is the bicycle for human mind, et cetera. Exactly. But there's a subset of the AI community that thinks that, for example, the way that we subsume some potential conflict with future AI or something else would be through some form of. Yeah. Like the neuralink pitch, et cetera. Exactly. Yeah. I don't know what this merger looks like yet, but I can definitely see that you want to decrease the I o to tool use. And I see this as kind of like an exocortex while building on top of our neocortex. Right. And it's just the next layer, and it just turns out to be in the cloud, et cetera. But it is the next layer of the brain. Yeah. Accelerondo book from the early two thousands has a version of this where basically everything is substantiated in a set of goggles that are computationally attached to your brain that you wear. And then if you lose them, you must feel like you're losing a part of your Persona or memory. I think that's very likely, yeah. And today, the phone is already almost at, and I think it's going to get worse. When you put your techno stuff away from you, you're just, like, naked human in nature. Well, you lose part of your intelligence. It's very anxiety inducing. A very simple example of that is just maps. Right. So a lot of people now, I've noticed, can't actually navigate their city very well anymore because they're always using turn by turn direction. And if we have this, for example, like, universal translator, which I don't think is too far away, like, you'll lose the ability to speak to people who don't speak English if you just put your stuff away. I'm very comfortable repurposing that part of my brain to do further research. I don't know if you saw the video of, like, the kid that has a magazine and is trying to, like, swipe on the magazine. Yeah. What's fascinating to me about it is, like, this kid doesn't understand what comes with nature and what's technology on top of the nature. Yeah. Because it made it so transparent. And I think this might look similar where people will just start assuming the tools, and then when you take them away, you realize, like, I guess, like, people don't know what's technology and what's not. If you're wearing this thing that's always translating everyone, or, like, doing stuff like that for you, then maybe people, like, lose the basic cognitive abilities, may not exist by nature. We're gonna specialize. You can't understand people who speak Spanish. Like, what the hell? Or, like, when you go to objects, like, in Disney, all the objects are alive. And I think we are gonna potentially come to that kind of a world where, why can't I talk to things? Like, already today, you can talk to Alexa, and you can ask her for things and so on. Yeah, yeah. I've seen some toy companies like that where they're basically trying to embed an LLM and a toy that can interact with a child. Yeah. Like, isn't it strange that when you go to a door, you can't just say open? Like, what the hell? Another favorite example of that. I don't know if you saw either demolition man or irobot. People make fun of the idea that you can't just talk to things. And what the hell? We're talking about an exocortex. That feels like a pretty fundamentally important thing to democratize access to. How do you think the current market structure of what's happening in LLM research, there's a small number of large labs that actually have a shot at the next generation progressing training. How does that translate to what people have access to in the future? So what you're kind of alluding to, maybe, is the state of the ecosystem. Right. So we have kind of, like, an oligopoly of a few closed platforms, and then we have an open platform that is kind of, like, behind. So, like, metal ama, et cetera. And this is kind of, like, mirroring the open source kind of ecosystem. I do think that when this stuff starts to. When we start to think of it as, like, an exocortex. So there's a saying in crypto, which is, like, not your keys, not your keys, not yours. Yeah. Like, is it the case that if it's, like, not your weights, not your brain, that's interesting, because the company is effectively controlling your exocortex, and therefore, a big part of it starts to feel kind of invasive. If this is my exocortex, I think people will care much more about ownership. Yes. Like, yeah, you realize you're renting your brain. Like, it seems much to rent your brain. The thought experiment was like, are you willing to give up ownership and control to rent a better brain? Because I am. Yeah. So I think that's the trade off. I think we'll see how that works. But maybe it's possible to, by default, use the closed versions because they're amazing. But you have a fallback in various scenarios. And I think that's kind of like the way things are shaping up today. Even when APIs go down on some of the closed source providers, people start to implement fallbacks to the open ecosystems, for example, that they fully control, and they feel empowered by that. So maybe that's just the extension of what it looks like for the brain as you fall back on the open source stuff, should anything happen. But most of the time you actually. So it's quite important that the open source stuff continues to progress. I think so, 100%, and this is not like an obvious point or something that people maybe agree on right now, but I think 100%. I guess one thing I've been wondering about a little bit is what is the smallest performant model that you can get to in some sense, either in parameter size or however you want to think about it, and some little bit curious about your view. Have you thought a lot about both distillation small models? I think it can be surprisingly small, and I do think that the current models are wasting a ton of capacity remembering stuff that doesn't matter. Like, they remember sha hashes, they remember, like the ancient, because the data set is not curated the best. Yeah, exactly. And I think this will go away. And I think we just need to get to the cognitive core. And I think the cognitive core can be extremely small, and it's just this thing that thinks, and if it needs to look up information, it knows how to use different tools. Is that like 3 billion parameters? Is that 20 billion parameters? I think even a billion, a billion suffices. We'll probably get to that point. And the models can be very, very small. And I think the reason they can be very small is fundamentally, I think, just like distillation works, maybe the only thing I would say distillation works surprisingly well. Distillation is where you get a really big model or a huge amount of computer or something like that, supervising a very small model, and you can actually stuff a lot of capability into a very small. Is there some sort of mathematical representation of that or some information, theoretical formulation of that? Because it almost feels like you should be able to calculate that in terms of what's the. Yeah, maybe like, one way to think about it is like we go back to the Internet data set, which is what we're working with. The Internet is like 0.001% cognition and 99.99% of information. Just like garbage. Yeah, and I think most of it is not useful to the thinking part, and it's like, yeah, I guess maybe another way to frame the question is, is there a mathematical representation of cognitive capability relative to model size, or how do you capture cognition in terms of, here's the min or max relative to what you're trying to accomplish, and maybe there's no good way to represent that. I think maybe a billion parameters gets you sort of like a good cognitive core. I think probably right. I think even 1 billion is too much. I don't know. We'll see. It's very exciting, given if you think about, well, it's a question of on an edge device versus on the cloud, and also this raw cost of using the model and everything. Yeah, it's very exciting. Right. But at less than a billion parameters, I have my exocortic cortex on a local device as well. Yeah. And then probably it's not a single model. Right? Like, it's interesting to me to think about what this will actually play out like, because I do think you want to benefit from parallelization. You don't have a sequential process. You want to have a parallel process. And I think companies to some extent are also kind of like parallelization of work, but there's a hierarchy in a company because that's one way to, you know, you have the information processing and the reductions that need to happen within organization for information. So I think we'll probably end up with companies for LLMs. I think it's not unlikely to me that you have models of different capabilities specialized to various unique domains. Maybe there's a programmer, et cetera, and it will actually start to resemble companies to a very large extent. So you'll have the programmer and the program manager and similar kinds of roles of LLMs working in parallel and coming together and orchestrating computation on your behalf. So maybe it's not correct to think about. It's more like a swarm. I wouldn't say it feels like an ecosystem. It's like a biological ecosystem. We have specialized roles and niches, and I think we'll start to resemble that. You have automatic escalation to other parts of the swarm, depending on the difficulty of the problem. So maybe the CEO is like a really brilliant cloud model, but the workers can be a lot cheaper, maybe even open source models or whatnot. And my cost function is different from your cost function. Yeah. So that could be interesting. You left open AI. You're working on education. You've always been an educator. Like, why do this? I would start with, I've always been an educator, and I love learning and I love teaching. And so it's kind of just like a space that I've been very passionate about for a long time. And then the other thing is, I think one macro picture that's kind of driving me is I think there's a lot of activity in AI, and I think most of it is to kind of replace or displace people. I would say it's in the theme of sliding away the people. But I'm always more interested in anything that empowers people. And I feel like I'm on a high level team human, and I'm interested in things that AI can do to empower people. And I don't want the future where people are kind of on the side of automation. I want people to be very in an empowered state, and I want them to be amazing, even much more amazing than today. And then other aspects that I find very interesting is like, how far can a person go if they have the perfect tutor for all the subjects? And I think people could go really far if they had the perfect curriculum for anything. And I think we see that with, you know, if some rich people maybe have tutors and they do actually go really far. And so I think we can approach that with AI or even like, surpass it. There's very clear literature on that, actually, from the eighties, right, where one on one tutoring, I think, helps people get one standard deviation better than bloom. Is it two? Yeah, it's the bloom stuff. Yeah, exactly. There's a lot of really interesting precedents on that. How do you actually view that as substantiating through the lens of AI? Or what's the first types of products that will really help with that? Because there's books like the diamond age where they talk about the young lady's illustrated primer and all that kind of stuff. So I'm definitely inspired by aspects of it. So in practice, what I'm doing is trying to currently build a single course, and I want it to be just like the course you would go to if you want to learn. Aih. I think the problem with basically is I've already taught courses. I taught 231 n at Stanford, and that was the first deep learning class and was pretty successful. But the question is, how do you actually really scale these classes? How do you make it so that your target audience has maybe 8 billion people on earth and they're all speaking different languages, and they're all different capability levels, etcetera. And a single teacher doesn't scale to that? Audience. The question is, how do you use AI to do the scaling of a really good teacher? The way I'm thinking about it is the teacher is doing a lot of the course creation and the curriculum, because currently, AI capability, I don't think the models are good enough to create a good course, but I think they're good to become the front end to the student and interpret the course to them. And so basically, the teacher doesn't go to the people, and the teacher is not the front end anymore. The teacher is on the back end designing the materials in the course, and the AI is the front end, and it can speak all the different languages, and it kind of like takes you through the course. Should I think of that as the ta type experience? Or is that not a good analogy here? That is like one way I'm thinking about it, it's Aita. I'm mostly thinking of it as like this front end to the student. And it's the thing that's actually interfacing with the student and taking them through the course. And I think that's tractable today and it just doesn't exist. And I think it can be made really good. And then over time, as the capability increases, you would potentially refactor the setup in various ways. I like to find things where like the AI capability today and having a good model of it. And I think a lot of companies that maybe don't have, who don't quite understand intuitively where the capability is today, and then they end up kind of like building things that are kind of like too ahead of what's available or maybe not ambitious enough. And so I think, I do think that this is kind of a sweet spot of what's possible and also really interesting and exciting. I want to go back to something you said that I think is very inspiring, especially coming from your background and understanding of where exactly we are in research, which is essentially like, we do not know what the limits of human performance from a learning perspective are, given much better tooling. And I think there's like a very easy analogy to, we just had the Olympics like month ago, right? And, you know, a runner, and it's the very best mile time or pick any sport today is much better than it was putting aside performance enhancing drugs like ten years ago. Just because like, you start training earlier, you have a very different program. We have much better scientific understanding. We have technique, we have gear. The fact that you believe, like, we can get much better as humans if we're starting with, like, the tooling, the curriculum is amazing. Yeah, I think we haven't even scratched, like, what's possible at all. So I think there's like two dimensions basically to it. Number one is the globalization dimension of like, I want everyone to have really good education, but the other one is like, how far can a single person go? I think both of those are very interesting and exciting. Usually when people talk about 101 learning, they talk about the adaptive aspect of it, where you're a challenging person at the level that they're at. Do you think you can do that with AI today or is that something for the future? And it's more today it's about reach and multiple languages. And globally, I think the long fruit is things like, for example, different languages, super low hanging fruit. I think the current models are actually really good at translation basically, and can target the material and translate it like at the spot. So I think a lot of things are longing fruit. This adaptability to a person's background I think is like not at the low laying fruit, but I don't think it's like too high up or too much away. But that is something you definitely want because not everyone is coming in with the same background. And also what's really helpful is like if you're familiar with some other disciplines in the past, then it's really useful to make analogies, the things you know, and that's extremely powerful in education. So that's definitely the mission you want to take advantage of. But I think that starts to get to the point where it's like not obvious and needs somewhere. I think the easy version of it is not too far where you can imagine just prompting the model. It's like, oh, hey, I know physics or I know this, and you probably get something. But I guess what I'm talking about is something that actually works, not something that you can demo and work sometimes. So I just mean it actually really works. And in a way a person would. Yeah, and that's the reason I was asking about adaptability, because also people learn at different rates or certain things they find challenging that others don't or vice versa. And so it's a little bit of how do you modulate relative to that context? And I guess you could have some reintroduction of what the person is good or bad at into the model over time as you. That's the thing with aihdem. A lot of these capabilities are just kind of prompt away. So you always get like demos, but like, do you actually get a product? You know what I mean? So in this sense, I would say the demo is near, but the product is far. So one thing we were talking about earlier, which I think is really interesting, is sort of lineages that happens in the research community where you come from certain labs, and everybody gossips about being from each other's labs. I think a very high proportion of Nobel laureates actually used to work in a former Nobel laureates lab. So there's some propagation about. If it's culture or knowledge or branding or what. In an AI education centric world, how do you maintain lineage? Or does it not matter? Or how do you think about those aspects of propagation of network and knowledge? I don't actually want to live in a world where lineage matters too much. Right. So I'm hoping that AI can help you destroy that structure a little bit. It feels like kind of gatekeeping by some finite scarce resource, which is like, oh, there's finite number of people who have this lineage, etcetera. So I feel like it's a little bit of that aspect. So I'm hoping it can destroy that. It's definitely one piece, like actual learning, one piece pedigree, right? Yeah. Well, it's also the aggregation of, it's a cluster effect, right? It's like, why is all of the, or much of the AI community in the Bay Area? Or why is most of the fintech community in New York? And so I think a lot of it is also just you're clustering really smart people with common interests and beliefs, and then they kind of propagate from that common core, and then they share knowledge in an interesting way. You've got to get a lot of that behavior has shifted online to some extent, particularly for younger people. I think one aspect of it is kind of like the educational aspect, where, like, if you're part of a community today, you're getting a ton of education and apprenticeship, et cetera, which is extremely helpful and gets you to a point of empowered state in that area. I think the other piece of it is like the cultural aspect of what you're motivated by and what you want to work on. What does the culture prize and what do they put on the pedestal and what do they kind of like worship, basically. So in academic world, for example, is the h index. Everyone cares about the h index, the amount of papers you publish, et cetera. And I was part of that community and I saw that, and I feel like now I've come to different places and there's different idols in all the different communities. And I think that has a massive impact of what people are motivated by and where they get their social status and what actually matters to them. I also was, I think, part of different communities, like growing up in Slovakia, also a very different environment. Being in Canada, also a very different environment. What mattered there. Hockey. Sorry. Thank you. Hockey, yeah, hockey. I would say, as an example, I would say in Canada, I was in University of Toronto and Toronto, I don't think it's very entrepreneurial pill environment. It doesn't even occur to you that you should be starting companies. I mean, it's not something that people are doing. You don't know friends who are doing it. You don't know that you're supposed to be looking up to it. People aren't reading books about all the founders and talking about them. It's just not a thing you aspire to or care about. And what everyone is talking about, oh, is where are you getting your internship, where are you going to work afterwards? And it's just accepted that there's a bunch of. There's a fixed set of companies that you are supposed to pick from and just align yourself with one of them. And that's like what you look up to or something like that. So these cultural aspects are extremely strong and maybe actually the dominant variable because I almost feel like today already, the education aspects, I think, are the easier one. Like a ton of stuff is already available, et cetera. So I think mostly it's a cultural aspect that you're part of. So on this point, like, one thing you and I were talking about a few weeks ago is, and I think you also posted online about this, there's a difference between learning and entertainment, and learning is actually supposed to be hard. And I think it relates to this question of, like, you know, status and what, like, status is a great motivator, like who the idol is. How much do you think you can change in terms of motivation through systems like this, if that's like a blocking factor, are you focused on give people the resources such that they can get as far as possible in the sequence for their own capability as they can? Like, further than any other point in history already inspirational? Or do you actually want to change how many people want to learn or at least bring themselves down the path? Want is a loaded word. I would say, like, I want to make it much easier to learn. And then maybe it is possible that maybe people don't want to learn. I mean, today, for example, people want to learn for practical reasons, right? Like they want to get a job, et cetera, which makes total sense. So in the pre AGI society, education is useful and I think people will be motivated by that because they're. They're climbing up the ladder economically, etcetera, I think. But in the post AGI society, we're just all going to society. I think education is entertainment to a much larger extent, including successful outcomes education. Right. Not just letting the content wash over you. Yes, I think so. Outcomes being like understanding, learning, being able to contribute new knowledge, or however you define it. I think it's not an accident that if you go back 200 years, 300 years, the people who are doing science were nobility or people of wealth. We will all be nobility. Learning with Andre. Yeah, I do think that I see it very much equivalent to your quote earlier. I feel like learning something is kind of like going to the gym, but for the brain. Right? Like it feels like going to the gym. I mean, going to the gym is fun. People like to lift, etcetera. Some people don't go to the gym. No, no, some people do, but it is. It takes effort. Yeah, yeah, it takes effort, but it's effortful. But it's also kind of fun. And you also have a payoff, like you feel good about yourself in various ways. Right. And I think education is basically equivalent to that. So that's what I mean when I say education should not be fun, et cetera. I mean, it is kind of fun, but it's like a specific kind of fun, I suppose. Right. I do think that maybe in a post a GI world, what I would hope happens is people actually, they do go to the gym a lot. Not just physically, but also mentally. And is something that we look up to as being highly educated and also, you know, just. Just. Yeah. Can I ask you one last question about Eureka? Just because I think it would be interesting people like, who is the audience for the first course? The audience for the first course. I'm mostly thinking of this as like an undergrad level course. So if you're doing undergrad in technical area, I think that would be kind of the ideal audience. I do think that what we're seeing now is we have this, like, antiquated concept of education where you go through school and then you graduate and go to work. Right. Obviously, this will totally break down, especially in a society that's turning over so quickly that people are gonna come back to school a lot more frequently as the technology changes very, very quickly. So it is kind of like undergrad level, but I would say, like, anyone at that level, at any age, is kind of like, in scope. I think it will be very diverse in age. As an example, but I think it is mostly like people who are technical and mostly want to, mostly actually want to understand it to a good amount. When can they take the course? I was hoping it would be late this year. I do have a lot of distractions that are piling on, but I think probably early next year is the timeline. Yeah, I'm trying to make it very, very good. It just takes time to get there. I have one last question, actually, that's pseudo related to that. If you have little kids today, what do you think they should study in order to have. Have a useful future? There's a correct answer in my mind, and the correct answer is mostly, like, I would say, like, math, physics, cs kind of disciplines. And the reason I say that is because I think it helps for just thinking skills. It's just like the best thinking skill core is my opinion. Of course I have a specific background, et cetera. So I would think this, but that's just my view on it. I think me taking physics classes and all these other classes just shaped the way I think. And I think it's very useful for problem solving in general. And so if we're in this world where pre AGI, this is going to be useful, post AGI, you still want empowered humans who can function in any arbitrary capacity. And so I just think that this is just the correct answer for people and what they should be doing and taking, and it's either useful or it's good. And so I just think it's the right answer. And I think a lot of the other stuff you can tack on a bit later, but the critical period where people have a lot of time and they have a lot of kind of, like, attention and time, I think it should be mostly spent on doing these kinds of simple manipulation, heavy tasks and workloads, not memory heavy tasks and workloads. Yeah, I did a math degree, and I felt like there was a new groove being carved into my brain that was doing that, and it's a harder groove to carve later. And I would, of course, put in a bunch of other stuff as well. Like, I'm not opposed to all the other disciplines, et cetera. I think it's actually beautiful to have a large diversity of things, but I do think 80% of it should be something like this one. We're not efficient memorizers compared to our tools. Thank you for doing this. It's so much fun. Yeah. Yes. Great to be here. Find us on Twitter opryerspod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week and sign up for emails or find transcripts for every episode at Know dash pryors.com.