번역: No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla

Transcript Translation

No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla - https://www.youtube.com/watch?v=hM_h0UA7upI

Hi, listeners. Welcome back to no priors. Today we're hanging out with Andre Karpathy, who needs no introduction. Andre is a renowned researcher, beloved AI educator, and Cuber, an early team member from OpenAI, the lead for Autopilot at Tesla, and now working on AI for education. We'll talk to him about the state of research, his new company, and what we can expect from AI. Thanks a lot for joining us today. It's great to have you here. Thank you. I'm happy to be here. You led Autopilot at Tesla, and now, like, we actually have fully self driving cars, passenger vehicles on the road. How do you read that in terms of where we are in the capability set, how quickly we should see increased capability or pervasive passenger vehicles? Yes, I spent maybe five years in self driving space. I think it's fascinating space. And basically what's happening in the field right now is, well, I do also think that I draw a lot of analogies, I would say, to AGI from self driving, and maybe that's just because I'm familiar with it, but I kind of feel like we've reached AGI a little bit in self driving, because there are systems today that you can basically take around, and as a paying customer, can take around here. So Waymo in San Francisco here is, of course, very common. Probably you've taken Waymo. I've taken it a bunch, and it's amazing. And it can drive you all over the place, and you're paying for it as a product. What's interesting with Waymo is the first time I took Waymo was actually a decade ago, almost exactly 2014 or so. And it was a friend of mine who worked there, and he gave me a demo, and it drove me around the block ten years ago, and it was basically perfect drive ten years ago. And it took ten years to go from, like, a demo that I had to a product I can pay for that's in the city scale and is expanding, et cetera. How much of that do you think was regulatory versus technology? Like, when do you think the technology was ready? Is it. I think it's technology. You're just not seeing it in a single demo drive of 30 minutes. You're not running into all the stuff that THEy had to deal with for a decade. And so demo and product, there's a massive gap there, and I think a lot of it also regulatory, et cetera. But I do think that we've sort of achieved AGI in the self driving space in that sense, a little bit. And yet I think what's really fascinating about it is the globalization hasn't happened at all. So you have a demo and you can take it in a stuff, but the world hasn't changed yet. And that's going to take a long time, going from a demo to an actual globalization of it, I think there's a big gap there. That's how it's related, I would say, to AGI, because I suspect similar. It looks in a similar way for AGI when we sort of get it and then staying for a minute in the self driving space. I think people think that Waymo is ahead of Tesla. I think personally, Tesla is ahead of Waymo. And I know it doesn't look like that, but I'm still very bullish on Tesla and its self driving program. I think that Tesla has a software problem, and I think Waymo has a hardware problem, is the way I put it. And I think software problems are much easier. Tesla has the deployment of all these cars on earth, like at scale. And I think Waymo needs to get there. And so the moment Tesla sort of, like, gets to the point where they can actually deploy this and it actually works, I think it's going to be really incredible. The latest builds I just drove yesterday, I mean, it's just driving me all over the place now. They've made, like, really good improvements, I would say very recently. Yeah, I've been using it a lot recently, and it actually works quite well. It did some miraculous driving for me yesterday. So I'm very impressed with what the team is doing. And so I still think that Tesla mostly has a software problem, way more mostly hardware problem. And so I think Tesla Waymo looks like it's winning kind of right now. But I think when we look in ten years and who's actually at scale and where most of the revenue is coming from, I still think they're ahead in that sense. How far away do you think we are from the software problem, turning the corner in terms of getting to some equivalency? Because obviously, to your point, if you look at a Waymo car, it has a lot of very expensive lidar and other sort of sensors built into the car, so it can do what it does. It sort of helps support the software system. And so if you can just use cameras, which is the Tesla approach, then you effectively get rid of enormous cost complexity, and you can do it in many different types of cars. When do you think that transition happens? I mean, in the next few years? I mean, I'm hoping, you know, like something like that, but actually what's really interesting about that is I'm not sure that people are appreciating that. TeSla actually does use a lot of expensive sensors. They just do it at training time. So there are a bunch of cars that drive around with lidars. They do a bunch of stuff that, like, doesn't scale, and they have extra sensors, et cetera, and they do mapping and all this stuff. You're doing it at training time, and then you're distilling that into a test time package that gets deployed to the cars and is vision only. And it's like an arbitrage on sensors and expense. And so I think it's actually kind of a brilliant strategy that I don't think is fully appreciated. And I think it's going to work out well because the pixels have the information, and I think the network will be capable of doing that. And yes, at training time, I think these sensors are really useful, but I don't think they're as useful at test time. And I think you can. It seems like the one other thing or transition that's happened is basically a move from a lot of sort of edge case designed heuristics associated with it versus end to end deep learning. And that's what other shift that's happened recently. Do you want to talk a little bit about that and sort of what that? Yeah, I think that was always like the plan from the start, I would say at Tesla, as I was talking about how the neural net can eat through the stack, because when I joined, there was a ton of c code, and now there's much, much less c code in the test time package that runs in the car because there's still a ton of stuff in the backend that we're not talking about. The neural net takes through the system. So first it just does a detection on the image level, then it does multiple images, gives you prediction, then multiple images over time give you a prediction, and you're discarding c code, and eventually you're just giving steering command. And so I think Tesla is kind of eating through the stack. My understanding is that current waymos are actually, like, not that, but that they've tried, but they ended up, like, not doing that is my current understanding. But I'm not sure because they don't talk about it. But I do fundamentally believe in this approach. And I think that's the last piece to fall if you want to think about it that way. And I do suspect that the end to end systems for Tesla in, like, say, ten years, it is just a neural net. I mean, the videos stream into a neural net and commands come out. You have to sort of build up to it incrementally and do it piece by piece. And even all the intermediate predictions and all these things that we've done, I don't think they've actually misled development. I think they're part of it, because there's a lot of solid reasons for this. So actually end to end driving, when you're just imitating humans and so on, you have very few bits of supervision to train a massive neural net. And it's too few bits of signal to train so many billions of parameters. And so these intermediate representations and so on help you develop the features and the detectors for everything. And then it makes it much easier problem for the end to end part of it. And so I suspect, although I don't know, because I'm not part of the team, but there's a ton of pre training happening so that you can do the fine tuning for end to end. And so basically, I feel like it was necessary to eat through it INCrementALLY. And that's what Tesla has done, I think, is the right approach, and it looks like it's working. So I'm really looking forward. If you had started end to end, you wouldn't have HAD the data anyway. That makes sense. Yeah. So you worked on the Tesla humanoid robot before you left. I have so many questions, but one is, like, starting here. What transfers? Basically, everything transfers, and I don't think people appreciate it. OKAY. That's a big claim. I think it's like a very different problem. It's basically robots. When you actually look at it, cars are robots. And Tesla, I don't think, is a car company. I think this is misleading. This is robotics company. Robotics at scale company, because I would say at scale is also like a whole separate variable. They're not building a single thing. They're building the machine that builds the thing, which is a whole separate thing. And so I think robotics at scale company is what Tesla is. And I think in terms of the transfer from cars to humanoids, it was not that much work at all. And in fact, like the early versions of Optimus, the robot, it thought it was a car because it had the exact same computer, it had the exact same cameras. It was really funny because we were running the car networks on the robot, but it's walking around the office and so on. Oh, nice. And it's trying to recognize drivable space, but it's all just walking space now, I suppose. But it actually kind of generalized a little bit, and there's some. Some fine tuning necessary and so on. But it thought it was driving, but it's actually like, moving through an environment is a reasonable way to think of this as, like, actually it's a robot. Many things transfer, but you're just missing, for example, actuation and action data. Yeah, you definitely miss some components. And the other part, I would say, is, like, so much transfers, like, the speed with which Optimus was started, I think, to me, was very impressive, because the moment Elon said, we're doing this, just people just showed up with all the right tools, and the stuff just showed up so quickly and all these cad models and all the supply chain stuff, and I just felt like, wow, there's so much has expertise for building robotics at Tesla, and it's all the same tools, and they're just like, okay, they're being reconfigured from a car, like a transformer, the movie, they're just being reconfigured and reshuffled, but it's like the same thing, and you need all the same components. You need to think about all the same kinds of stuff, both on the hardware side, on the scale stuff, and also on the brains. And so for the brains, there was also a ton of transfer, not just of the specific networks, but also all of the approach and the labeling team and how it all coordinates and the approaches people are taking. I just think there's a ton of transfer. What do you think of the first application areas for humanoid robotics or human form stuff? I think a lot of people have this vision of it, like doing your laundry, et cetera. I think that will come late. I don't think B, two C should be the right start point, because I don't think we can have a robot like crush Grandma is how I put it, sort of. I think it's, like too much legal liability. It's just like, I don't Have a very porky hug. I was just going to fall over or something like that. These things are not perfect yet, and they require some amount of work. So I think the best customer is yourself first. And I think PRoBablY tesla is going to do this. I'm very bullish on Tesla. If people can tell the first customer is yourself, and you incubate it in the factory and so on, doing maybe a lot of material handling, etcetera. This way you don't have to create contracts working with third parties, it's all really heavy. There's lawyers involved, like, et cetera. You incubate it then you go, I think B two b second, and you go to other companies that have massive warehouses. We can do material handling. We're going to do all this stuff, contrast, get drafted up, fences, get put around, all this kind of stuff. And then once you incubate it in multiple companies, I think that's when you start to go into the b two c applications. I do think we'll see b two c robots also like unit tree and so on, are starting to come up with robots that I really want. I got one. You did? Yeah. Okay. Yeah. The g one. Yeah. So I will probably buy one of those. And there's probably going to be an ecosystem of people building on those platforms too. But I think in terms of what wins at scale, I would expect that kind of a approach. But in the beginning, it's a lot of material handling and then going towards more and more HKC things that are more specific. One that I'm really excited about is the net freedman challenge of the leaf blower. I would love for an optimist to walk down the street, tiptoe down the street, and pick up individual leaves so that we don't need leaf blowers. And I think this will work, and it's an amazing task. And so I would hope that that's one of the first applications I just. Even raking. Yeah, that should work too. Just very quietly. Yeah, just quiet raking. That's cute. I mean, they do actually have a machine that's working. It's just not a humanoid. Can we talk about the humanoid thesis for a second? The simplest version of this is like the world is built for humans, and you build one set of hardware. The right thing to do is build a model that can do an increasing set of tasks in this set of hardware. I think there's another camp that believes, well, humans are not optimal for any given task. You can make them stronger or bigger or smaller or whatever, and why shouldn't we do superhuman things? How do you think about this? I think people are maybe underappreciating the complexity of any fixed cost that goes into any single platform. I think there's a large fixed cost. You're paying for any single platform. And so I think it makes a lot of sense to centralize that and have a single platform that can do all the things. I would say the humanoid aspect is also very appealing because people can tell operate it very easily. And so it's a data collection thing that is extremely helpful because people will be able to obviously very easily tell operate it. I think that's usually overlooked. There's, of course, the aspect you mentioned, which is like the world designed for humans, et cetera. So I think that's also important. I mean, I think we'll have some variations on the humanoid platform, but I think there is a large fixed cost training platform. And then I would say also one last dimension of it is you benefit a ton from the transfer, learning between the different tasks. And in AI, you really want the single neural net that is multitasking, doing lots of things. That's where you're getting all the intelligence and the capability from. And that's also why language models are so interesting, is because you have a single regime, like a text domain, multitasking, all these different problems, and they're all sharing knowledge between each other, and it's all coupled in a single neural net. And I think you want that kind of a platform, and you want all the data you collect for leaf picking to benefit all the other tasks. If you're building a special purpose thing for any one thing, you're not going to benefit from a lot of the transferring between all the other tasks, if that makes sense. Yeah, I think there's one argument of the g. One is 30 grand, but it seems hard to build a very capable humanoid robot under a certain bomb. And if you wanted to put an arm on wheels that can do things like, maybe there are cheaper approaches to a general platform at the beginning. Does that make sense to you? Cheaper approaches to a general platform from a hardware perspective? Yeah, I think that makes sense. Yeah. You put a wheel on it instead of a feed, et cetera. I do feel like, I wonder if it's taking down like a local minimum a little bit. I just feel like pick a platform, make it perfect is like the long term pretty good bet. And then the other thing, of course, is I just think it will be kind of familiar to people, and I think people will understand that maybe you want to talk to it. And I feel like the psychological aspect also of it, I think favors possibly the human platform, unless people are, like, scared of it and would actually prefer a platform that is more abstract of like some. But then I don't know if this is real monster doing stuff, then I don't know if that's, like, more. It's interesting that I think that the other form factor for the unit tree is a dog. Right. And it's almost a more friendlier, familiar. Yeah. But then people watch black mirror, and suddenly the dog flips to, like, a scary thing, like, so it's hard to think through I just think psychologically, it will be easy for people to understand what's happening. What do you think is missing in terms of technological milestones for progress. Relative to substantiating this future for robotics? For robotics, yeah. Or the humanoid robot or anything else human form? Yeah. I don't know that I have a really good window into it. I do think that it is kind of interesting that in a humanoid form factor, for example, for the lower body. I don't know that you want to do imitation, learning from demonstration. Because for lower body, it's all a lot of inverted pendulum control and stuff like that. It's for the upper body that you need a lot of teleoperation and data collection. And end to end and et cetera. And so I think everything becomes, like, very hybrid in that sense. And I don't know how those systems interact. When I talk to people working, they feel a lot of what they focus on is actuation and manipulation. And sort of digital manipulation and things like that. Yeah. I do expect in the beginning, it's a lot of, like, teleoperation. For getting stuff off the ground and imitating it. And getting something that works 95% of the time. And then talking about human to robot ratios. And gradually having people who are supervisors of robots. Instead of doing the task directly. And all this kind of stuff is going to happen over time. And pretty gradually. I don't know that there's any individual impediments. That I'm really familiar with. I just think it's a lot of grunt work. A lot of the tools are available. Transformers are this beautiful blob of tissue. You can just get just arbitrary tasks. And you just need the data you need to put in the right form. You need to train it. You need to experiment with it. You need to deploy it, iterate on it. Just a lot of groundwork. I don't know that I have a single individual thing. That is holding us back, technically. Where are we? In the state of large blob research. Large blob research, yeah, we're in a really good state. So I think. I'm not sure if it's fully appreciated. But the transformer is much more amazing. It's not just another neural net. It's an amazing neural net, extremely general. So, for example, when people talk about the scaling loss in neural networks. The scaling loss are actually to a large extent of property of the transformer. Before the transformer, people are playing with lstms and stacking them, et cetera. You don't actually get, like, clean scaling loss. And this thing doesn't actually train and doesn't actually work. It's the transformer that was the first thing that actually just kind of like scales and you get scaling loss and everything makes sense. So it's just like general purpose training computer. I think of it as kind of a computer, but it's like a differentiable computer and you can just give it inputs and outputs and billions of it, and you can train with backpropagation. It actually kind of like arranges itself into a thing that does the task. And so I think it's actually kind of like a magical thing that we've stumbled on in the algorithm space. And I think there's a few individual innovations that went into it. So you have the residual connections that was a piece that existed. You have the layer normalizations that needs to slot in, you have the attention block, you have the lack of these saturating non linearities like tanhs and so on. Those are not present in the transformer because they kill gradient signals. So there's a few, like there's four or five innovations that all existed and were put together into this transformer, and that's what Google did with their paper. And this thing actually trains and suddenly you get scaling loss and suddenly you have this piece of tissue that just trains to a very large extent. And so it was a major unlock. You feel like we are not near the limit of that unlock, right? Because I think there is a discussion, of course, the data wall and how expensive another generation of scale would be. How do you think about that? That's where you start to get into. I don't think that the neural network architecture is holding us back fundamentally anymore. It's not the bottleneck, whereas I think in the previous, before transformer, it was a bottleneck, but now it's not the bottleneck. Now we're talking lot more about what is the loss function, where's the dataset? We're talking a lot more about those. And those have become the bottlenecks almost. It's not the general piece of tissue that reconfigures based on whatever you want it to be. And so that's where I think a lot of the activity has moved, and that's why a lot of the companies and so on, who are applying this technology, like they're not thinking about the transformer march, they're not thinking about the architecture. The llama release, like the transformer hasn't changed that much. We've added the rope positional and the rope route, the positional encodings that's like the major change. Everything else doesn't really matter too much. It's like plus 3% on a small few things, but really, it's like rope is the only thing that's slotted in, and that's the transformer as it has changed since the last five years or something. So there hasn't been that much innovation on that. Everyone just takes it for granted, let's train it, et cetera. And then everyone's just innovating on the data set mostly, and the loss function details. So that's where all the activity has gone to. Right. But what about the argument, like, in that domain, that that was easier when we were taking Internet data and we're out of Internet data, and so the question are really around, like, synthetic data or more expensive data collection? So I think that's a good point. So that's where a lot of the activity is now in LLMs. So the Internet data is like, not the data you want for your transformer. It's like a nearest neighbor that actually gets you really far, surprisingly. But the Internet data is a bunch of Internet web pages, right? It's just like what you want is the inner thought monologue of your brain. Yeah, that's the idea. The trajectories in your brain. The trajectories in your brain as you're doing problem solving. If we had a billion of that, like AGI is here, roughly speaking, I mean, to a very large extent, and we just don't have that. So where a lot of activity is now, I think, is we have the Internet data that actually gets you really close, because it just so happens that Internet has enough of reasoning traces in it and a bunch of knowledge, and the transformer just makes it work. Okay, so I think a lot of activity now is around refactoring the data set into these inner monologue formats. And I think there's a ton of synthetic data generation that's helpful for that. So what's interesting about that also is, like, the extent to which the current models are helping us create the next generation of models. And so it's kind of like the staircase of how much do you think synthetic data is? Or how far does that get us? Right? Because to your point, on each data, each model helps you train the subsequent model better, or at least create tools for it. Data labeling, whatever may be part of it, is synthetic data. How important do you think the synthetic data piece is? Because when I talk to people, I think this is the only way we can make progress, is we have to make it work? I think with synthetic data, you just have to be careful because these models are silently collapsed is, like, one of the major issues. So if you go to chatsupt and you ask it to give you a joke, you'll notice that it only knows, like, three jokes. That's, like, the only. It gives you, like, one joke, I think, most of the time. And sometimes it gives you, like, three jokes, and it's because the models are collapsed and it's silent. So when you're looking at any single individual output, you're just seeing a single example. But when you actually look at the distribution, you'll notice that it's not a very diverse distribution. It's silently collapsed. When you're doing synthetic data generation, this is a problem because you actually really want that entropy. You want the diversity and the richness in your data set. Otherwise, you're getting collapsed data sets, and you can't see it when you look at any individual. But the distribution has lost a ton of entropy and richness, and so it silently gets worse. And so that's why you have to be very careful, and you have to make sure that you maintain your entropy in your dataset, and there's a ton of techniques for that. As an example, someone released this Persona dataset. As an example, the Persona dataset is a dataset of 1 billion personalities. Like humans, like backgrounds. Oh, yes, I saw this. Yeah, I'm a teacher or I'm an artist. I live here, I do this, et cetera, and it's like little paragraphs of fictitious human background. And what you do when you do synthetic data generation is not only complete this task and do it in this way, but also imagine you're describing it to this person, and you put in this information, and now you're forcing it to explore more of the space and you're getting some entropy. So I think you have to be just very careful to inject the entropy, maintain the distribution. And that's the hard part, that I think maybe people aren't sufficiently appreciating as much in general. So I think, basically, synthetic data, absolutely. The future we're not going to run out of data, is my impression. I just think you have to be careful. What do you think we are learning now about human cognition from this research? I don't know if we're learning. One could argue that figuring out the shape of reasoning traces we want, for example, is instructive to actually understand how the brain works. I would be careful with those analogies, but in general, I do think that it's a very different kind of thing. But I do think that there are some analogies you can draw. So, as an example, I think transformers are actually better than the human brain in a bunch of ways. I think they're actually a lot more efficient system. And the reason they don't work as good as the human brain is mostly data issue, roughly speaking, is the first order approximation, I would say. And actually, as an example, transformer. Memorizing sequences is so much better than humans. If you give it a sequence and you do a single forward backward pass in that sequence, then if you give it the first few elements, it will complete the rest of the sequence. It memorized that sequence, and it's so good at it. If you gave a human a single presentation of a sequence, there's no way that you can remember that. And so the transformers, actually, I do think there's a good chance that the gradient based optimization, the forward backward update that we do all the time for training neural nets, is actually more efficient than the brain in some ways. And these models are better. They're just not yet ready to shine. But in a bunch of cognitive sort of aspects, I think they might come out with the right inputs. They will be better. That's generically true of computers for all sorts of applications. Right? Putting memory to your point. Yeah, exactly. And I think human brains just have a lot of constraints. The working memory is very small. I think transformers have a lot bigger working memory, and this will continue to be the case. They are much more efficient learners. The human brains function under all kinds of constraints. It's not obvious that human ranges is backpropagation. It's not obvious how that would work. It's very stochastic, dynamic system. It has all these constraints. It works under so ambient conditions, et cetera. So I do think that what we have is actually potentially better than the brain, and it's just not there yet. How do you think about human augmentation with different AI systems over time? Do you think that's a likely direction? Do you think that's unlikely? Augmentation, augmentation of people with AI models? Oh, of course. I mean, but in what sense? Maybe. I think in general, absolutely. Because, I mean, there's the abstract version of it you're using as a tool. That's the external version. There's the merger scenario. A lot of people end up talking about. I mean, we're already kind of merging. The thing is like, there's the I O bottleneck, but for the most part, at your fingertips. If you have any of these models. Yeah, but that's a little bit different, because, I mean, people have been making that argument for, I think, 40, 50 years where technological tools are just extension of human capabilities. Right. Yeah. The computer is the bicycle for human mind, et cetera. Exactly. But there's a subset of the AI community that thinks that, for example, the way that we subsume some potential conflict with future AI or something else would be through some form of. Yeah. Like the neuralink pitch, et cetera. Exactly. Yeah. I don't know what this merger looks like yet, but I can definitely see that you want to decrease the I o to tool use. And I see this as kind of like an exocortex while building on top of our neocortex. Right. And it's just the next layer, and it just turns out to be in the cloud, et cetera. But it is the next layer of the brain. Yeah. Accelerondo book from the early two thousands has a version of this where basically everything is substantiated in a set of goggles that are computationally attached to your brain that you wear. And then if you lose them, you must feel like you're losing a part of your Persona or memory. I think that's very likely, yeah. And today, the phone is already almost at, and I think it's going to get worse. When you put your techno stuff away from you, you're just, like, naked human in nature. Well, you lose part of your intelligence. It's very anxiety inducing. A very simple example of that is just maps. Right. So a lot of people now, I've noticed, can't actually navigate their city very well anymore because they're always using turn by turn direction. And if we have this, for example, like, universal translator, which I don't think is too far away, like, you'll lose the ability to speak to people who don't speak English if you just put your stuff away. I'm very comfortable repurposing that part of my brain to do further research. I don't know if you saw the video of, like, the kid that has a magazine and is trying to, like, swipe on the magazine. Yeah. What's fascinating to me about it is, like, this kid doesn't understand what comes with nature and what's technology on top of the nature. Yeah. Because it made it so transparent. And I think this might look similar where people will just start assuming the tools, and then when you take them away, you realize, like, I guess, like, people don't know what's technology and what's not. If you're wearing this thing that's always translating everyone, or, like, doing stuff like that for you, then maybe people, like, lose the basic cognitive abilities, may not exist by nature. We're gonna specialize. You can't understand people who speak Spanish. Like, what the hell? Or, like, when you go to objects, like, in Disney, all the objects are alive. And I think we are gonna potentially come to that kind of a world where, why can't I talk to things? Like, already today, you can talk to Alexa, and you can ask her for things and so on. Yeah, yeah. I've seen some toy companies like that where they're basically trying to embed an LLM and a toy that can interact with a child. Yeah. Like, isn't it strange that when you go to a door, you can't just say open? Like, what the hell? Another favorite example of that. I don't know if you saw either demolition man or irobot. People make fun of the idea that you can't just talk to things. And what the hell? We're talking about an exocortex. That feels like a pretty fundamentally important thing to democratize access to. How do you think the current market structure of what's happening in LLM research, there's a small number of large labs that actually have a shot at the next generation progressing training. How does that translate to what people have access to in the future? So what you're kind of alluding to, maybe, is the state of the ecosystem. Right. So we have kind of, like, an oligopoly of a few closed platforms, and then we have an open platform that is kind of, like, behind. So, like, metal ama, et cetera. And this is kind of, like, mirroring the open source kind of ecosystem. I do think that when this stuff starts to. When we start to think of it as, like, an exocortex. So there's a saying in crypto, which is, like, not your keys, not your keys, not yours. Yeah. Like, is it the case that if it's, like, not your weights, not your brain, that's interesting, because the company is effectively controlling your exocortex, and therefore, a big part of it starts to feel kind of invasive. If this is my exocortex, I think people will care much more about ownership. Yes. Like, yeah, you realize you're renting your brain. Like, it seems much to rent your brain. The thought experiment was like, are you willing to give up ownership and control to rent a better brain? Because I am. Yeah. So I think that's the trade off. I think we'll see how that works. But maybe it's possible to, by default, use the closed versions because they're amazing. But you have a fallback in various scenarios. And I think that's kind of like the way things are shaping up today. Even when APIs go down on some of the closed source providers, people start to implement fallbacks to the open ecosystems, for example, that they fully control, and they feel empowered by that. So maybe that's just the extension of what it looks like for the brain as you fall back on the open source stuff, should anything happen. But most of the time you actually. So it's quite important that the open source stuff continues to progress. I think so, 100%, and this is not like an obvious point or something that people maybe agree on right now, but I think 100%. I guess one thing I've been wondering about a little bit is what is the smallest performant model that you can get to in some sense, either in parameter size or however you want to think about it, and some little bit curious about your view. Have you thought a lot about both distillation small models? I think it can be surprisingly small, and I do think that the current models are wasting a ton of capacity remembering stuff that doesn't matter. Like, they remember sha hashes, they remember, like the ancient, because the data set is not curated the best. Yeah, exactly. And I think this will go away. And I think we just need to get to the cognitive core. And I think the cognitive core can be extremely small, and it's just this thing that thinks, and if it needs to look up information, it knows how to use different tools. Is that like 3 billion parameters? Is that 20 billion parameters? I think even a billion, a billion suffices. We'll probably get to that point. And the models can be very, very small. And I think the reason they can be very small is fundamentally, I think, just like distillation works, maybe the only thing I would say distillation works surprisingly well. Distillation is where you get a really big model or a huge amount of computer or something like that, supervising a very small model, and you can actually stuff a lot of capability into a very small. Is there some sort of mathematical representation of that or some information, theoretical formulation of that? Because it almost feels like you should be able to calculate that in terms of what's the. Yeah, maybe like, one way to think about it is like we go back to the Internet data set, which is what we're working with. The Internet is like 0.001% cognition and 99.99% of information. Just like garbage. Yeah, and I think most of it is not useful to the thinking part, and it's like, yeah, I guess maybe another way to frame the question is, is there a mathematical representation of cognitive capability relative to model size, or how do you capture cognition in terms of, here's the min or max relative to what you're trying to accomplish, and maybe there's no good way to represent that. I think maybe a billion parameters gets you sort of like a good cognitive core. I think probably right. I think even 1 billion is too much. I don't know. We'll see. It's very exciting, given if you think about, well, it's a question of on an edge device versus on the cloud, and also this raw cost of using the model and everything. Yeah, it's very exciting. Right. But at less than a billion parameters, I have my exocortic cortex on a local device as well. Yeah. And then probably it's not a single model. Right? Like, it's interesting to me to think about what this will actually play out like, because I do think you want to benefit from parallelization. You don't have a sequential process. You want to have a parallel process. And I think companies to some extent are also kind of like parallelization of work, but there's a hierarchy in a company because that's one way to, you know, you have the information processing and the reductions that need to happen within organization for information. So I think we'll probably end up with companies for LLMs. I think it's not unlikely to me that you have models of different capabilities specialized to various unique domains. Maybe there's a programmer, et cetera, and it will actually start to resemble companies to a very large extent. So you'll have the programmer and the program manager and similar kinds of roles of LLMs working in parallel and coming together and orchestrating computation on your behalf. So maybe it's not correct to think about. It's more like a swarm. I wouldn't say it feels like an ecosystem. It's like a biological ecosystem. We have specialized roles and niches, and I think we'll start to resemble that. You have automatic escalation to other parts of the swarm, depending on the difficulty of the problem. So maybe the CEO is like a really brilliant cloud model, but the workers can be a lot cheaper, maybe even open source models or whatnot. And my cost function is different from your cost function. Yeah. So that could be interesting. You left open AI. You're working on education. You've always been an educator. Like, why do this? I would start with, I've always been an educator, and I love learning and I love teaching. And so it's kind of just like a space that I've been very passionate about for a long time. And then the other thing is, I think one macro picture that's kind of driving me is I think there's a lot of activity in AI, and I think most of it is to kind of replace or displace people. I would say it's in the theme of sliding away the people. But I'm always more interested in anything that empowers people. And I feel like I'm on a high level team human, and I'm interested in things that AI can do to empower people. And I don't want the future where people are kind of on the side of automation. I want people to be very in an empowered state, and I want them to be amazing, even much more amazing than today. And then other aspects that I find very interesting is like, how far can a person go if they have the perfect tutor for all the subjects? And I think people could go really far if they had the perfect curriculum for anything. And I think we see that with, you know, if some rich people maybe have tutors and they do actually go really far. And so I think we can approach that with AI or even like, surpass it. There's very clear literature on that, actually, from the eighties, right, where one on one tutoring, I think, helps people get one standard deviation better than bloom. Is it two? Yeah, it's the bloom stuff. Yeah, exactly. There's a lot of really interesting precedents on that. How do you actually view that as substantiating through the lens of AI? Or what's the first types of products that will really help with that? Because there's books like the diamond age where they talk about the young lady's illustrated primer and all that kind of stuff. So I'm definitely inspired by aspects of it. So in practice, what I'm doing is trying to currently build a single course, and I want it to be just like the course you would go to if you want to learn. Aih. I think the problem with basically is I've already taught courses. I taught 231 n at Stanford, and that was the first deep learning class and was pretty successful. But the question is, how do you actually really scale these classes? How do you make it so that your target audience has maybe 8 billion people on earth and they're all speaking different languages, and they're all different capability levels, etcetera. And a single teacher doesn't scale to that? Audience. The question is, how do you use AI to do the scaling of a really good teacher? The way I'm thinking about it is the teacher is doing a lot of the course creation and the curriculum, because currently, AI capability, I don't think the models are good enough to create a good course, but I think they're good to become the front end to the student and interpret the course to them. And so basically, the teacher doesn't go to the people, and the teacher is not the front end anymore. The teacher is on the back end designing the materials in the course, and the AI is the front end, and it can speak all the different languages, and it kind of like takes you through the course. Should I think of that as the ta type experience? Or is that not a good analogy here? That is like one way I'm thinking about it, it's Aita. I'm mostly thinking of it as like this front end to the student. And it's the thing that's actually interfacing with the student and taking them through the course. And I think that's tractable today and it just doesn't exist. And I think it can be made really good. And then over time, as the capability increases, you would potentially refactor the setup in various ways. I like to find things where like the AI capability today and having a good model of it. And I think a lot of companies that maybe don't have, who don't quite understand intuitively where the capability is today, and then they end up kind of like building things that are kind of like too ahead of what's available or maybe not ambitious enough. And so I think, I do think that this is kind of a sweet spot of what's possible and also really interesting and exciting. I want to go back to something you said that I think is very inspiring, especially coming from your background and understanding of where exactly we are in research, which is essentially like, we do not know what the limits of human performance from a learning perspective are, given much better tooling. And I think there's like a very easy analogy to, we just had the Olympics like month ago, right? And, you know, a runner, and it's the very best mile time or pick any sport today is much better than it was putting aside performance enhancing drugs like ten years ago. Just because like, you start training earlier, you have a very different program. We have much better scientific understanding. We have technique, we have gear. The fact that you believe, like, we can get much better as humans if we're starting with, like, the tooling, the curriculum is amazing. Yeah, I think we haven't even scratched, like, what's possible at all. So I think there's like two dimensions basically to it. Number one is the globalization dimension of like, I want everyone to have really good education, but the other one is like, how far can a single person go? I think both of those are very interesting and exciting. Usually when people talk about 101 learning, they talk about the adaptive aspect of it, where you're a challenging person at the level that they're at. Do you think you can do that with AI today or is that something for the future? And it's more today it's about reach and multiple languages. And globally, I think the long fruit is things like, for example, different languages, super low hanging fruit. I think the current models are actually really good at translation basically, and can target the material and translate it like at the spot. So I think a lot of things are longing fruit. This adaptability to a person's background I think is like not at the low laying fruit, but I don't think it's like too high up or too much away. But that is something you definitely want because not everyone is coming in with the same background. And also what's really helpful is like if you're familiar with some other disciplines in the past, then it's really useful to make analogies, the things you know, and that's extremely powerful in education. So that's definitely the mission you want to take advantage of. But I think that starts to get to the point where it's like not obvious and needs somewhere. I think the easy version of it is not too far where you can imagine just prompting the model. It's like, oh, hey, I know physics or I know this, and you probably get something. But I guess what I'm talking about is something that actually works, not something that you can demo and work sometimes. So I just mean it actually really works. And in a way a person would. Yeah, and that's the reason I was asking about adaptability, because also people learn at different rates or certain things they find challenging that others don't or vice versa. And so it's a little bit of how do you modulate relative to that context? And I guess you could have some reintroduction of what the person is good or bad at into the model over time as you. That's the thing with aihdem. A lot of these capabilities are just kind of prompt away. So you always get like demos, but like, do you actually get a product? You know what I mean? So in this sense, I would say the demo is near, but the product is far. So one thing we were talking about earlier, which I think is really interesting, is sort of lineages that happens in the research community where you come from certain labs, and everybody gossips about being from each other's labs. I think a very high proportion of Nobel laureates actually used to work in a former Nobel laureates lab. So there's some propagation about. If it's culture or knowledge or branding or what. In an AI education centric world, how do you maintain lineage? Or does it not matter? Or how do you think about those aspects of propagation of network and knowledge? I don't actually want to live in a world where lineage matters too much. Right. So I'm hoping that AI can help you destroy that structure a little bit. It feels like kind of gatekeeping by some finite scarce resource, which is like, oh, there's finite number of people who have this lineage, etcetera. So I feel like it's a little bit of that aspect. So I'm hoping it can destroy that. It's definitely one piece, like actual learning, one piece pedigree, right? Yeah. Well, it's also the aggregation of, it's a cluster effect, right? It's like, why is all of the, or much of the AI community in the Bay Area? Or why is most of the fintech community in New York? And so I think a lot of it is also just you're clustering really smart people with common interests and beliefs, and then they kind of propagate from that common core, and then they share knowledge in an interesting way. You've got to get a lot of that behavior has shifted online to some extent, particularly for younger people. I think one aspect of it is kind of like the educational aspect, where, like, if you're part of a community today, you're getting a ton of education and apprenticeship, et cetera, which is extremely helpful and gets you to a point of empowered state in that area. I think the other piece of it is like the cultural aspect of what you're motivated by and what you want to work on. What does the culture prize and what do they put on the pedestal and what do they kind of like worship, basically. So in academic world, for example, is the h index. Everyone cares about the h index, the amount of papers you publish, et cetera. And I was part of that community and I saw that, and I feel like now I've come to different places and there's different idols in all the different communities. And I think that has a massive impact of what people are motivated by and where they get their social status and what actually matters to them. I also was, I think, part of different communities, like growing up in Slovakia, also a very different environment. Being in Canada, also a very different environment. What mattered there. Hockey. Sorry. Thank you. Hockey, yeah, hockey. I would say, as an example, I would say in Canada, I was in University of Toronto and Toronto, I don't think it's very entrepreneurial pill environment. It doesn't even occur to you that you should be starting companies. I mean, it's not something that people are doing. You don't know friends who are doing it. You don't know that you're supposed to be looking up to it. People aren't reading books about all the founders and talking about them. It's just not a thing you aspire to or care about. And what everyone is talking about, oh, is where are you getting your internship, where are you going to work afterwards? And it's just accepted that there's a bunch of. There's a fixed set of companies that you are supposed to pick from and just align yourself with one of them. And that's like what you look up to or something like that. So these cultural aspects are extremely strong and maybe actually the dominant variable because I almost feel like today already, the education aspects, I think, are the easier one. Like a ton of stuff is already available, et cetera. So I think mostly it's a cultural aspect that you're part of. So on this point, like, one thing you and I were talking about a few weeks ago is, and I think you also posted online about this, there's a difference between learning and entertainment, and learning is actually supposed to be hard. And I think it relates to this question of, like, you know, status and what, like, status is a great motivator, like who the idol is. How much do you think you can change in terms of motivation through systems like this, if that's like a blocking factor, are you focused on give people the resources such that they can get as far as possible in the sequence for their own capability as they can? Like, further than any other point in history already inspirational? Or do you actually want to change how many people want to learn or at least bring themselves down the path? Want is a loaded word. I would say, like, I want to make it much easier to learn. And then maybe it is possible that maybe people don't want to learn. I mean, today, for example, people want to learn for practical reasons, right? Like they want to get a job, et cetera, which makes total sense. So in the pre AGI society, education is useful and I think people will be motivated by that because they're. They're climbing up the ladder economically, etcetera, I think. But in the post AGI society, we're just all going to society. I think education is entertainment to a much larger extent, including successful outcomes education. Right. Not just letting the content wash over you. Yes, I think so. Outcomes being like understanding, learning, being able to contribute new knowledge, or however you define it. I think it's not an accident that if you go back 200 years, 300 years, the people who are doing science were nobility or people of wealth. We will all be nobility. Learning with Andre. Yeah, I do think that I see it very much equivalent to your quote earlier. I feel like learning something is kind of like going to the gym, but for the brain. Right? Like it feels like going to the gym. I mean, going to the gym is fun. People like to lift, etcetera. Some people don't go to the gym. No, no, some people do, but it is. It takes effort. Yeah, yeah, it takes effort, but it's effortful. But it's also kind of fun. And you also have a payoff, like you feel good about yourself in various ways. Right. And I think education is basically equivalent to that. So that's what I mean when I say education should not be fun, et cetera. I mean, it is kind of fun, but it's like a specific kind of fun, I suppose. Right. I do think that maybe in a post a GI world, what I would hope happens is people actually, they do go to the gym a lot. Not just physically, but also mentally. And is something that we look up to as being highly educated and also, you know, just. Just. Yeah. Can I ask you one last question about Eureka? Just because I think it would be interesting people like, who is the audience for the first course? The audience for the first course. I'm mostly thinking of this as like an undergrad level course. So if you're doing undergrad in technical area, I think that would be kind of the ideal audience. I do think that what we're seeing now is we have this, like, antiquated concept of education where you go through school and then you graduate and go to work. Right. Obviously, this will totally break down, especially in a society that's turning over so quickly that people are gonna come back to school a lot more frequently as the technology changes very, very quickly. So it is kind of like undergrad level, but I would say, like, anyone at that level, at any age, is kind of like, in scope. I think it will be very diverse in age. As an example, but I think it is mostly like people who are technical and mostly want to, mostly actually want to understand it to a good amount. When can they take the course? I was hoping it would be late this year. I do have a lot of distractions that are piling on, but I think probably early next year is the timeline. Yeah, I'm trying to make it very, very good. It just takes time to get there. I have one last question, actually, that's pseudo related to that. If you have little kids today, what do you think they should study in order to have. Have a useful future? There's a correct answer in my mind, and the correct answer is mostly, like, I would say, like, math, physics, cs kind of disciplines. And the reason I say that is because I think it helps for just thinking skills. It's just like the best thinking skill core is my opinion. Of course I have a specific background, et cetera. So I would think this, but that's just my view on it. I think me taking physics classes and all these other classes just shaped the way I think. And I think it's very useful for problem solving in general. And so if we're in this world where pre AGI, this is going to be useful, post AGI, you still want empowered humans who can function in any arbitrary capacity. And so I just think that this is just the correct answer for people and what they should be doing and taking, and it's either useful or it's good. And so I just think it's the right answer. And I think a lot of the other stuff you can tack on a bit later, but the critical period where people have a lot of time and they have a lot of kind of, like, attention and time, I think it should be mostly spent on doing these kinds of simple manipulation, heavy tasks and workloads, not memory heavy tasks and workloads. Yeah, I did a math degree, and I felt like there was a new groove being carved into my brain that was doing that, and it's a harder groove to carve later. And I would, of course, put in a bunch of other stuff as well. Like, I'm not opposed to all the other disciplines, et cetera. I think it's actually beautiful to have a large diversity of things, but I do think 80% of it should be something like this one. We're not efficient memorizers compared to our tools. Thank you for doing this. It's so much fun. Yeah. Yes. Great to be here. Find us on Twitter opryerspod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week and sign up for emails or find transcripts for every episode at Know dash pryors.com.

No Priors 80회 | OpenAI와 Tesla의 Andrej Karpathy와 함께 - https://www.youtube.com/watch?v=hM_h0UA7upI

안녕하세요, 청취자 여러분. no priors에 다시 오신 것을 환영합니다. 오늘은 소개가 필요 없는 Andre Karpathy와 함께합니다. Andre는 유명한 연구자이자 사랑받는 AI 교육자이며, OpenAI의 초기 팀원이자 Tesla의 Autopilot 책임자이며 현재 교육용 AI에서 일하고 있는 Cuber입니다. 그와 함께 연구 현황, 그의 새로운 회사, AI에서 기대할 수 있는 것에 대해 이야기해 보겠습니다. 오늘 저희와 함께 해주셔서 감사합니다. 여러분을 뵙게 되어 기쁩니다. 감사합니다. 여기에 와서 기쁩니다. Tesla에서 Autopilot을 이끌었고, 지금은 실제로 완전 자율 주행 자동차, 승용차가 도로에 있습니다. 역량 집합에서 우리가 어디에 있는지, 얼마나 빨리 역량이 향상되거나 승용차가 보편화될 것인지에 대해 어떻게 생각하십니까? 네, 저는 자율 주행 분야에서 5년을 보냈습니다. 저는 그것이 매혹적인 공간이라고 생각합니다. 그리고 기본적으로 지금 현장에서 일어나고 있는 일은, 글쎄요, 저는 또한 많은 비유를 그립니다. 자율 주행에서 AGI에 대한 것이고, 제가 그것에 익숙하기 때문일 수도 있지만, 저는 자율 주행에서 AGI에 약간 도달했다고 생각합니다. 왜냐하면 오늘날 기본적으로 가지고 다닐 수 있는 시스템이 있고, 유료 고객으로서 여기에서 가지고 다닐 수 있기 때문입니다. 샌프란시스코의 Waymo는 물론 매우 일반적입니다. 아마 여러분은 Waymo를 가져갔을 것입니다. 저는 그것을 많이 가져갔고, 그것은 놀랍습니다. 그리고 그것은 여러분을 모든 곳으로 몰고 갈 수 있고, 여러분은 그것을 제품으로 지불하고 있습니다. Waymo에서 흥미로운 점은 제가 처음으로 Waymo를 가져간 것이 실제로 10년 전, 거의 정확히 2014년경이었습니다. 그리고 그곳에서 일하는 제 친구가 저에게 데모를 주었고, 그것은 10년 전에 저를 블록 주변으로 몰고 갔고, 기본적으로 10년 전에는 완벽한 운전이었습니다. 그리고 제가 가진 데모에서 제가 지불할 수 있는 도시 규모의 제품으로 확장되는 데 10년이 걸렸습니다. 그 중 얼마나 많은 부분이 규제와 기술 때문이라고 생각하십니까? 언제 기술이 준비되었다고 생각하십니까? 그렇습니까? 저는 그것이 기술이라고 생각합니다. 30분짜리 단일 데모 주행에서 그것을 볼 수 없습니다. 10년 동안 처리해야 했던 모든 것에 부딪히지 않습니다. 그래서 데모와 제품, 거기에는 엄청난 격차가 있고, 저는 그 중 많은 부분이 규제 등이라고 생각합니다. 하지만 저는 우리가 그런 의미에서 자율 주행 분야에서 AGI를 달성했다고 생각합니다. 그런데도 정말 흥미로운 점은 세계화가 전혀 일어나지 않았다는 것입니다. 데모가 있고 그것을 물건으로 가져갈 수 있지만 세상은 아직 바뀌지 않았습니다. 그리고 데모에서 실제 세계화로 가는 데는 오랜 시간이 걸릴 것입니다. 거기에 큰 격차가 있다고 생각합니다. 그게 AGI와 관련이 있다고 말하고 싶은데, 비슷한 것 같아요. AGI를 이해하고 잠시 자율주행 분야에 머물렀을 때 비슷한 것 같아요. 사람들이 Waymo가 테슬라보다 앞서 있다고 생각하는 것 같아요. 개인적으로는 테슬라가 Waymo보다 앞서 있다고 생각해요. 그렇게 보이지는 않지만 저는 여전히 테슬라와 자율주행 프로그램에 대해 매우 강세를 보이고 있어요. 테슬라는 소프트웨어 문제가 있고 Waymo는 하드웨어 문제가 있다고 제가 표현한 방식이에요. 그리고 소프트웨어 문제는 훨씬 쉽다고 생각해요. 테슬라는 지구상에 이 모든 자동차를 대규모로 배치하고 있어요. 그리고 Waymo도 거기에 도달해야 한다고 생각해요. 그래서 테슬라가 실제로 이것을 배치하고 실제로 작동하는 지점에 도달하는 순간, 정말 대단할 거라고 생각해요. 어제 제가 운전한 최신 빌드는, 뭐랄까요, 지금은 저를 온통 몰고 다니게 만들고 있어요. 그들은 정말 좋은 개선을 이루었어요, 아주 최근에요. 네, 최근에 많이 사용했는데, 실제로 꽤 잘 작동합니다. 어제는 기적적으로 운전을 했습니다. 그래서 저는 팀이 하는 일에 매우 감명을 받았습니다. 그래서 저는 여전히 테슬라가 주로 소프트웨어 문제를 가지고 있다고 생각합니다. 더 많은 하드웨어 문제가 있습니다. 그래서 저는 테슬라 웨이모가 지금은 어느 정도 승리하고 있는 것처럼 보입니다. 하지만 10년 후를 내다보면 실제로 규모를 확장하고 대부분의 수익이 어디에서 나오는지, 저는 여전히 그들이 그런 면에서 앞서 있다고 생각합니다. 소프트웨어 문제에서 얼마나 멀리 떨어져 있다고 생각하십니까? 동등한 수준에 도달하는 측면에서 코너를 돌고 있습니까? 분명히 당신이 말했듯이, 웨이모 자동차를 보면 매우 비싼 라이더와 다른 종류의 센서가 자동차에 내장되어 있어서 원하는 것을 할 수 있습니다. 그것은 소프트웨어 시스템을 지원하는 데 도움이 됩니다. 그래서 테슬라의 접근 방식인 카메라만 사용할 수 있다면, 효과적으로 엄청난 비용 복잡성을 제거하고 다양한 유형의 자동차에서 이를 수행할 수 있습니다. 언제 그러한 전환이 일어날 것이라고 생각하십니까? 제 말은, 앞으로 몇 년 안에요? 저는 그런 식으로 되기를 바라지만, 사실 정말 흥미로운 점은 사람들이 그것을 감사하게 여기지 않는다는 것입니다. TeSla는 실제로 많은 값비싼 센서를 사용합니다. 그들은 훈련 시간에만 그것을 합니다. 그래서 라이더를 장착한 자동차가 많이 있습니다. 그들은 확장할 수 없는 많은 일을 하고, 추가 센서 등을 장착하고, 매핑 등을 합니다. 훈련 시간에 그것을 하고, 그런 다음 그것을 자동차에 배포되고 비전만 있는 테스트 시간 패키지로 정제합니다. 그리고 그것은 센서와 비용에 대한 차익 거래와 같습니다. 그래서 저는 그것이 완전히 감사받지 못한다고 생각하는 훌륭한 전략이라고 생각합니다. 그리고 픽셀에 정보가 있고 네트워크가 그것을 할 수 있기 때문에 잘 될 것이라고 생각합니다. 그리고 네, 훈련 시간에는 이 센서들이 정말 유용하다고 생각하지만 테스트 시간에는 그렇게 유용하지 않다고 생각합니다. 그리고 여러분은 유용하다고 생각합니다. 일어난 또 다른 한 가지 또는 전환은 기본적으로 그것과 관련된 많은 종류의 엣지 케이스 설계 휴리스틱에서 엔드 투 엔드 딥 러닝으로의 이동인 것 같습니다. 그리고 그것이 최근에 일어난 다른 변화입니다. 그것에 대해 조금 이야기하고 싶습니까? 네, 저는 그것이 항상 처음부터 계획과 같았다고 생각합니다. 테슬라에서 제가 신경망이 스택을 어떻게 먹을 수 있는지에 대해 이야기했을 때, 제가 합류했을 때 c 코드가 엄청 많았고 지금은 자동차에서 실행되는 테스트 시간 패키지에서 c 코드가 훨씬 훨씬 줄었습니다. 여전히 백엔드에 우리가 이야기하지 않는 많은 것들이 있기 때문입니다. 신경망은 시스템을 통과합니다. 그래서 처음에는 이미지 수준에서 감지를 한 다음 여러 이미지를 처리하고 예측을 제공하고 시간이 지남에 따라 여러 이미지가 예측을 제공하고 C 코드를 버리고 결국 조향 명령을 내리게 됩니다. 그래서 저는 테슬라가 스택을 먹어치우는 것 같습니다. 제가 아는 바로는 현재의 웨이모는 사실 그게 아니지만 시도는 했지만 결국 그렇게 하지 않은 것 같습니다. 하지만 그들이 그것에 대해 이야기하지 않기 때문에 확신할 수는 없습니다. 하지만 저는 근본적으로 이 접근 방식을 믿습니다. 그리고 그것이 그런 식으로 생각하고 싶다면 마지막으로 빠질 부분이라고 생각합니다. 그리고 저는 테슬라의 종단 간 시스템이 10년 동안 신경망일 뿐이라고 생각합니다. 즉, 비디오가 신경망으로 스트리밍되고 명령이 나옵니다. 점진적으로 구축하고 조각조각 해야 합니다. 그리고 모든 중간 예측과 우리가 한 모든 이런 것들이 실제로 개발을 오도했다고 생각하지 않습니다. 저는 그것들이 그 일부라고 생각합니다. 왜냐하면 이것에 대한 확실한 이유가 많기 때문입니다. 사실 엔드 투 엔드 주행은 인간을 모방하는 것일 뿐이고, 거대한 신경망을 훈련할 감독 비트가 매우 적습니다. 그리고 수십억 개의 매개변수를 훈련하기에는 신호 비트가 너무 적습니다. 그래서 이러한 중간 표현 등은 모든 것에 대한 기능과 감지기를 개발하는 데 도움이 됩니다. 그러면 엔드 투 엔드 부분에 대한 문제가 훨씬 쉬워집니다. 그래서 저는 팀의 일원이 아니기 때문에 잘 모르겠지만, 엔드 투 엔드에 대한 미세 조정을 할 수 있도록 사전 훈련이 많이 이루어진다고 생각합니다. 그래서 기본적으로 점진적으로 그것을 먹어치우는 것이 필요하다고 생각합니다. 그리고 테슬라가 한 일은 올바른 접근 방식이라고 생각하고 효과가 있는 것 같습니다. 그래서 저는 정말 기대하고 있습니다. 엔드 투 엔드로 시작했다면 어차피 데이터를 가질 수 없었을 것입니다. 그럴 만합니다. 그렇죠. 그래서 떠나기 전에 테슬라 휴머노이드 로봇을 작업했잖아요. 질문이 너무 많은데, 하나는 여기서 시작하는 거예요. 무엇이 이전되나요? 기본적으로 모든 것이 이전되고, 사람들이 그것을 좋아하지 않는 것 같아요. 좋아요. 큰 주장이네요. 아주 다른 문제인 것 같아요. 기본적으로 로봇이에요. 실제로 살펴보면 자동차는 로봇이에요. 그리고 테슬라는 자동차 회사가 아니라고 생각해요. 오해의 소지가 있다고 생각해요. 이건 로봇 회사예요. 규모에 따른 로봇 회사예요. 규모에 따른 로봇 회사라고 할까요. 왜냐하면 규모에 따른 로봇도 완전히 별개의 변수라고 말할 수 있거든요. 그들은 단 하나의 것을 만드는 것이 아니에요. 그들은 그 것을 만드는 기계를 만들고 있는데, 그것은 완전히 별개의 문제예요. 그래서 저는 규모에 따른 로봇 회사가 테슬라라고 생각해요. 그리고 자동차에서 휴머노이드로의 이전 측면에서는 그렇게 많은 작업이 필요하지 않았어요. 사실, 옵티머스의 초기 버전처럼 로봇은 정확히 같은 컴퓨터와 정확히 같은 카메라를 가지고 있었기 때문에 자동차라고 생각했어요. 로봇에서 자동차 네트워크를 실행하고 있었지만, 사무실을 돌아다니고 그런 식이었기 때문에 정말 웃겼습니다. 오, 멋지네요. 그리고 운전할 수 있는 공간을 인식하려고 하지만, 지금은 그냥 걷는 공간일 뿐인 것 같아요. 하지만 실제로는 약간 일반화되었고, 몇 가지가 있습니다. 미세 조정이 필요하고 그런 식이죠. 하지만 운전하고 있다고 생각했지만, 실제로는 환경을 통과하는 것이 이것을 로봇이라고 생각하는 합리적인 방법인 것 같습니다. 많은 것들이 전송되지만, 예를 들어 작동 및 동작 데이터와 같은 것이 누락되었습니다. 네, 확실히 일부 구성 요소가 누락되었습니다. 그리고 다른 부분은, 제가 말하고 싶은 것은, 옵티머스가 시작된 속도와 같이, 많은 이전과 같은 것입니다. 저는 엘론이 우리가 이걸 하고 있다고 말한 순간, 사람들이 모든 적절한 도구를 가지고 나타났고, 물건들이 너무 빨리 나타났고, 이 모든 CAD 모델과 모든 공급망 물건들이 있었고, 저는 테슬라에서 로봇 공학을 만드는 데 많은 전문 지식이 있다는 것을 느꼈고, 모두 같은 도구이고, 그들은 마치 자동차에서 재구성된 것처럼, 마치 변압기, 영화처럼, 그들은 재구성되고 재편되고 있지만, 그것은 같은 것이고, 모든 동일한 구성 요소가 필요합니다. 하드웨어 측면, 규모 측면에서, 그리고 두뇌 측면에서 모든 동일한 종류의 것에 대해 생각해야 합니다. 그리고 두뇌에 대해서도, 특정 네트워크뿐만 아니라 모든 접근 방식과 라벨링 팀, 그리고 모든 것이 어떻게 조정되고 사람들이 취하는 접근 방식에 대한 엄청난 양의 이전이 있었습니다. 저는 그저 엄청난 양의 이전이 있다고 생각합니다. 인간형 로봇이나 인간 형태의 것에 대한 첫 번째 적용 분야는 무엇이라고 생각하십니까? 많은 사람들이 세탁을 하는 것과 같은 비전을 가지고 있다고 생각합니다. 저는 그것이 나중에 올 것이라고 생각합니다. 저는 B, 두 개의 C가 적절한 시작점이 될 수 없다고 생각합니다. 왜냐하면 저는 로봇이 크러쉬 할머니와 같을 수 없다고 생각하기 때문입니다. 저는 그것이 너무 많은 법적 책임과 같다고 생각합니다. 마치, 저는 아주 돼지고기를 껴안지 않았습니다. 저는 그냥 넘어질 뻔했습니다. 이런 것들은 아직 완벽하지 않으며 어느 정도 작업이 필요합니다. 그래서 저는 가장 좋은 고객은 우선 여러분 자신이라고 생각합니다. 그리고 저는 아마도 테슬라가 이것을 할 것이라고 생각합니다. 저는 테슬라에 대해 매우 강경합니다. 사람들이 첫 번째 고객이 당신 자신이라고 말할 수 있고, 당신은 공장에서 인큐베이션을 하고, 아마도 많은 자재 취급 등을 할 수 있습니다. 이렇게 하면 제3자와 협력하여 계약을 만들 필요가 없고, 모든 것이 정말 무겁습니다. 변호사 등이 개입합니다. 인큐베이션을 한 다음, B2B초라고 생각하고, 거대한 창고가 있는 다른 회사로 이동합니다. 우리는 자재 취급을 할 수 있습니다. 우리는 이 모든 일을 하고, 대조하고, 초안을 작성하고, 울타리를 치고, 배치하고, 이런 종류의 모든 일을 할 것입니다. 그런 다음 여러 회사에서 인큐베이션을 한 후에야 B2C 응용 프로그램을 시작할 수 있다고 생각합니다. 유닛 트리 등과 같은 B2C 로봇도 제가 정말 원하는 로봇을 내놓기 시작할 것이라고 생각합니다. 저는 하나를 가지고 있습니다. 당신은 가지고 있습니까? 네. 알겠습니다. 네. G 하나. 네. 그래서 저는 아마 그 중 하나를 살 것입니다. 그리고 아마도 그 플랫폼을 구축하는 사람들의 생태계도 있을 것입니다. 하지만 규모에 따라 이기는 면에서는 그런 종류의 접근 방식을 기대합니다. 하지만 처음에는 많은 자재 취급이 있고, 그다음에는 점점 더 구체적인 HKC로 넘어갑니다. 제가 정말 기대하는 것은 낙엽 청소기의 넷 프리드먼 챌린지입니다. 낙관주의자가 길을 걸으며 발끝으로 걸으며 개별 낙엽을 주워서 낙엽 청소기가 필요 없게 되기를 바랍니다. 저는 이것이 효과가 있을 것이라고 생각하고, 놀라운 작업입니다. 그래서 저는 그것이 제가 방금 적용한 첫 번째 응용 프로그램 중 하나가 되기를 바랍니다. 심지어 긁는 것도요. 네, 그것도 효과가 있을 겁니다. 아주 조용히요. 네, 조용히 긁는 것. 귀엽네요. 사실, 그들은 실제로 작동하는 기계를 가지고 있습니다. 그냥 인간형이 아닙니다. 잠깐 인간형 이론에 대해 이야기해 볼까요? 이것의 가장 간단한 버전은 세상이 인간을 위해 만들어졌고, 당신은 하나의 하드웨어 세트를 만드는 것입니다. 옳은 일은 이 하드웨어 세트에서 점점 더 많은 작업을 수행할 수 있는 모델을 구축하는 것입니다. 인간은 주어진 작업에 최적이 아니라고 믿는 또 다른 진영이 있다고 생각합니다. 인간은 더 강하거나 더 크거나 더 작게 만들 수 있고, 왜 우리가 초인적인 일을 하지 말아야 합니까? 이에 대해 어떻게 생각하십니까? 사람들은 단일 플랫폼에 들어가는 고정 비용의 복잡성을 과소평가하고 있다고 생각합니다. 저는 큰 고정 비용이 있다고 생각합니다. 단일 플랫폼에 비용을 지불하고 있습니다. 따라서 중앙 집중화하고 모든 작업을 수행할 수 있는 단일 플랫폼을 갖는 것이 매우 합리적이라고 생각합니다. 인간형 측면도 매우 매력적이라고 말하고 싶습니다. 사람들이 매우 쉽게 작동할 수 있기 때문입니다. 따라서 사람들이 매우 쉽게 작동할 수 있기 때문에 데이터 수집이 매우 유용합니다. 저는 그것이 일반적으로 간과된다고 생각합니다. 물론 언급하신 측면이 있는데, 인간을 위해 설계된 세계와 같은 것입니다. 그래서 저는 그것이 또한 중요하다고 생각합니다. 제 말은, 휴머노이드 플랫폼에 몇 가지 변형이 있을 것이라고 생각하지만, 고정 비용이 큰 훈련 플랫폼이 있다고 생각합니다. 그리고 마지막으로 말씀드리자면, 다른 작업 간의 전환, 학습에서 많은 이점을 얻을 수 있습니다. 그리고 AI에서는 멀티태스킹, 여러 가지 작업을 수행하는 단일 신경망이 필요합니다. 모든 지능과 역량을 얻는 곳이 바로 여기입니다. 그리고 언어 모델이 흥미로운 이유도, 텍스트 도메인과 같은 단일 체제가 있고, 멀티태스킹, 이 모든 다른 문제가 있고, 모두 서로 지식을 공유하고, 모두 단일 신경망에 결합되어 있기 때문입니다. 그리고 그런 종류의 플랫폼이 필요하고, 잎 따기에 대해 수집한 모든 데이터가 다른 모든 작업에 도움이 되기를 바랍니다. 어떤 한 가지를 위해 특수 목적을 가진 것을 만든다면, 다른 모든 작업 간의 전환에서 많은 이점을 얻지 못할 것입니다. 이해하시나요? 네, g에 대한 한 가지 주장이 있다고 생각합니다. 하나는 3만 달러지만, 특정 폭탄 아래에서 매우 유능한 휴머노이드 로봇을 만드는 건 어려워 보입니다. 그리고 바퀴에 팔을 달아서 무언가를 할 수 있게 하고 싶다면, 처음에 일반 플랫폼에 대한 더 저렴한 접근 방식이 있을 수 있습니다. 이게 말이 되나요? 하드웨어 관점에서 일반 플랫폼에 대한 더 저렴한 접근 방식? 네, 제 생각에는 말이 되는 것 같습니다. 네. 피드 대신 바퀴를 달았죠. 저는 그렇게 생각하는데, 지역적 최소값을 조금 낮추는 게 아닐까 싶습니다. 저는 플랫폼을 선택하고 완벽하게 만드는 것이 장기적으로 꽤 좋은 선택이라고 생각합니다. 그리고 다른 것은 물론 사람들에게 친숙할 것이고, 사람들이 당신이 그것과 대화하고 싶어한다는 것을 이해할 것이라고 생각합니다. 그리고 저는 심리적인 측면도, 사람들이 그것을 두려워하고 실제로 더 추상적인 플랫폼을 선호하지 않는 한, 아마도 인간 플랫폼을 선호한다고 생각합니다. 하지만 이것이 진짜 괴물이 하는 일인지는 모르겠고, 그것이 더 많은 것인지도 모르겠습니다. 유닛 트리의 다른 폼 팩터가 개라는 것이 흥미롭습니다. 맞죠? 그리고 그것은 거의 더 친근하고 친숙합니다. 네. 하지만 사람들이 블랙 미러를 보고, 갑자기 개가 무서운 것으로 바뀌는 것처럼, 그래서 생각하기 어렵습니다. 저는 심리적으로 사람들이 무슨 일이 일어나고 있는지 이해하기 쉬울 것이라고 생각합니다. 진보를 위한 기술적 이정표 측면에서 무엇이 빠졌다고 생각하십니까? 로봇공학의 미래를 입증하는 것과 관련하여? 로봇공학에 대해서요. 아니면 휴머노이드 로봇이나 다른 인간 형태에 대해서요? 네. 저는 그것에 대한 정말 좋은 통찰력이 있는지 잘 모르겠습니다. 저는 휴머노이드 폼 팩터에서, 예를 들어 하체의 경우, 그것이 흥미롭다고 생각합니다. 저는 당신이 모방을 하고 싶거나, 시범을 통해 배우고 싶어하는지 모르겠습니다. 하체의 경우, 그것은 모두 역진자 제어와 그런 것과 같은 것입니다. 상체의 경우 많은 원격 조작과 데이터 수집이 필요합니다. 그리고 엔드 투 엔드 등등. 그래서 저는 모든 것이 그런 의미에서 매우 하이브리드가 된다고 생각합니다. 그리고 저는 그 시스템들이 어떻게 상호 작용하는지 모릅니다. 제가 일하는 사람들과 이야기할 때, 그들은 그들이 집중하는 것이 대부분 작동과 조작이라고 느낍니다. 그리고 일종의 디지털 조작과 그런 것들. 네. 저는 처음에는 많은 원격 조작이 있을 것으로 예상합니다. 물건을 땅에서 꺼내서 모방하기 위해서요. 그리고 95%의 시간 동안 작동하는 것을 얻습니다. 그리고 인간과 로봇의 비율에 대해 이야기합니다. 그리고 점차적으로 로봇의 감독자가 되는 사람들을 갖게 됩니다. 작업을 직접 수행하는 대신. 그리고 이런 모든 종류의 일은 시간이 지남에 따라 일어날 것입니다. 그리고 꽤 점진적으로. 저는 개인적인 장애물이 있는지 모릅니다. 저는 그것에 대해 잘 알고 있습니다. 저는 그것이 많은 힘든 일이라고 생각합니다. 많은 도구를 사용할 수 있습니다. 트랜스포머는 이 아름다운 조직 덩어리입니다. 임의의 작업을 얻을 수 있습니다. 그리고 올바른 형태로 입력해야 하는 데이터만 필요합니다. 그것을 훈련해야 합니다. 그것을 실험해야 합니다. 그것을 배포하고 반복해야 합니다. 많은 기초 작업입니다. 저는 제가 단 하나의 개인적인 것이 있는지 모릅니다. 기술적으로 우리를 방해하는 것입니다. 우리는 어디에 있습니까? 대규모 블롭 연구 상태에 있습니다. 대규모 블롭 연구, 예, 우리는 정말 좋은 상태에 있습니다. 저는 그렇게 생각합니다. 그것이 완전히 이해되는지 잘 모르겠습니다. 하지만 트랜스포머는 훨씬 더 놀랍습니다. 그냥 또 다른 신경망이 아닙니다. 놀라운 신경망이고, 매우 일반적입니다. 예를 들어, 사람들이 신경망의 스케일링 손실에 대해 이야기할 때, 스케일링 손실은 실제로 많은 부분에서 변환기의 속성입니다. 변환기가 나오기 전에 사람들은 lstm을 가지고 놀고, 쌓고, 등등을 했습니다. 실제로는 깨끗한 스케일링 손실을 얻을 수 없습니다. 그리고 이건 실제로 훈련하지 않고 실제로 작동하지 않습니다. 변환기는 실제로 스케일링과 같은 첫 번째 것이었고 스케일링 손실을 얻고 모든 것이 의미가 있습니다. 그래서 그것은 일반적인 목적의 훈련 컴퓨터와 같습니다. 저는 그것을 일종의 컴퓨터라고 생각하지만, 그것은 미분 가능한 컴퓨터와 같고, 입력과 출력을 수십억 개만 줄 수 있고, 역전파로 훈련할 수 있습니다. 그것은 실제로 작업을 수행하는 것으로 스스로를 배열하는 것과 같습니다. 그래서 저는 그것이 알고리즘 공간에서 우연히 발견한 마법 같은 것과 같다고 생각합니다. 그리고 저는 거기에 몇 가지 개별적인 혁신이 들어갔다고 생각합니다. 그래서 당신은 존재하는 조각이었던 잔여 연결을 가지고 있습니다. 당신은 슬롯에 들어가야 하는 계층 정규화를 가지고 있고, 당신은 어텐션 블록을 가지고 있고, 당신은 탄젠트 등과 같은 이러한 포화 비선형성의 부족을 가지고 있습니다. 그것들은 그래디언트 신호를 죽이기 때문에 변환기에 존재하지 않습니다. 그래서 몇 가지가 있습니다. 모두 존재하고 이 변환기에 결합된 4~5개의 혁신과 같은 것들이 있고, 그것이 Google이 그들의 논문에서 한 일입니다. 그리고 이것은 실제로 훈련하고 갑자기 스케일링 손실이 발생하고 갑자기 매우 큰 범위에서 훈련하는 이 조직 조각이 생깁니다. 그래서 그것은 주요 잠금 해제였습니다. 당신은 우리가 그 잠금 해제의 한계에 가깝지 않다고 느낄 것입니다, 맞습니까? 왜냐하면 저는 데이터 벽과 또 다른 세대의 스케일이 얼마나 비쌀지에 대한 논의가 있다고 생각합니다. 그것에 대해 어떻게 생각하십니까? 그것이 당신이 들어가기 시작하는 곳입니다. 저는 신경망 아키텍처가 더 이상 근본적으로 우리를 억제하고 있다고 생각하지 않습니다. 병목 현상은 아니지만, 이전에 변압기 이전에는 병목 현상이었지만 지금은 병목 현상이 아닙니다. 이제 손실 함수가 무엇인지, 데이터 세트는 어디에 있는지에 대해 훨씬 더 많이 이야기하고 있습니다. 이에 대해 훨씬 더 많이 이야기하고 있습니다. 그리고 이것이 거의 병목 현상이 되었습니다. 원하는 대로 재구성되는 일반적인 조직이 아닙니다. 그래서 많은 활동이 이동한 곳이라고 생각하고, 이 기술을 적용하는 많은 회사 등이 변압기 행진에 대해 생각하지 않고 아키텍처에 대해 생각하지 않는 이유입니다. 변압기와 같은 라마 릴리스는 크게 변경되지 않았습니다. 로프 위치와 로프 경로를 추가했고, 주요 변경 사항인 위치 인코딩을 추가했습니다. 다른 모든 것은 그다지 중요하지 않습니다. 작은 몇 가지에 3%가 더해지는 것 같지만, 실제로는 로프가 유일하게 끼워지는 것이고, 그것이 지난 5년 동안 바뀐 변압기입니다. 그래서 그 부분에 대한 혁신은 많지 않았습니다. 모두가 당연하게 여기고, 훈련시키자고 하는 식으로 생각합니다. 그리고 모두가 주로 데이터 세트와 손실 함수 세부 정보에 대해 혁신하고 있습니다. 그래서 모든 활동이 그곳으로 갔습니다. 그렇죠. 하지만 인터넷 데이터를 수집할 때 더 쉬웠고 인터넷 데이터가 없을 때와 같은 도메인에서 주장하는 것은 어떨까요? 그래서 질문은 합성 데이터 또는 더 비싼 데이터 수집과 같은 것입니까? 그래서 저는 그것이 좋은 지적이라고 생각합니다. 그래서 LLM에서 많은 활동이 지금 있는 곳입니다. 그래서 인터넷 데이터는 변압기에 필요한 데이터가 아닙니다. 놀랍게도 실제로 당신을 정말 멀리 데려다주는 가장 가까운 이웃과 같습니다. 하지만 인터넷 데이터는 인터넷 웹 페이지의 무리가 맞죠? 마치 당신이 원하는 것이 뇌의 내면적 사고 독백과 같습니다. 네, 그게 바로 아이디어입니다. 뇌의 궤적. 문제를 해결할 때 뇌의 궤적. 만약 그런 것이 10억 개 있다면, AGI가 여기 있는 것처럼, 대략적으로, 제 말은, 아주 큰 규모로, 우리는 그것을 가지고 있지 않습니다. 그래서 지금 많은 활동이 있는 곳은, 제 생각에, 실제로 당신을 정말 가까이 데려다주는 인터넷 데이터가 있습니다. 왜냐하면 인터넷에는 추론 흔적이 충분하고 많은 지식이 있고, 변환기가 그냥 그것을 작동시키기 때문입니다. 좋아요, 그래서 지금 많은 활동이 데이터 세트를 이러한 내면적 독백 형식으로 리팩토링하는 것과 관련이 있다고 생각합니다. 그리고 저는 그것에 도움이 되는 합성 데이터 생성이 많이 있다고 생각합니다. 그래서 그것에 대해 흥미로운 점은, 현재 모델이 차세대 모델을 만드는 데 도움이 되는 정도입니다. 그래서 합성 데이터가 얼마나 된다고 생각하십니까? 아니면 그것이 우리를 얼마나 멀리 데려다줄까요? 맞죠? 당신이 말했듯이, 각 데이터에서 각 모델은 후속 모델을 더 잘 훈련하거나 적어도 그에 대한 도구를 만드는 데 도움이 됩니다. 데이터 레이블링은 그 일부가 무엇이든 합성 데이터입니다. 합성 데이터 조각이 얼마나 중요하다고 생각하십니까? 제가 사람들과 이야기할 때, 이것이 우리가 진전을 이룰 수 있는 유일한 방법이라고 생각합니다. 작동하도록 만들어야 합니까? 합성 데이터의 경우 조심해야 합니다. 이러한 모델은 조용히 축소되는 것이 주요 문제 중 하나이기 때문입니다. 따라서 채팅 지원에 가서 농담을 해달라고 요청하면 농담이 세 개밖에 없다는 것을 알게 될 것입니다. 그게 유일합니다. 대부분의 경우 농담이 하나라고 생각합니다. 때로는 농담이 세 개나 되는데, 그 이유는 모델이 축소되어 조용하기 때문입니다. 따라서 개별 출력을 볼 때 단일 예만 보게 됩니다. 하지만 실제로 분포를 살펴보면 분포가 그렇게 다양하지 않다는 것을 알게 될 것입니다. 조용히 붕괴됩니다. 합성 데이터 생성을 할 때, 이는 실제로 엔트로피를 원하기 때문에 문제가 됩니다. 데이터 세트에서 다양성과 풍부함을 원합니다. 그렇지 않으면 붕괴된 데이터 세트를 얻게 되고, 개인을 볼 때 볼 수 없습니다. 하지만 분포는 엔트로피와 풍부함을 많이 잃었고, 그래서 조용히 악화됩니다. 그래서 매우 조심해야 하고, 데이터 세트에서 엔트로피를 유지해야 하며, 이를 위한 기술이 많이 있습니다. 예를 들어, 누군가가 이 페르소나 데이터 세트를 공개했습니다. 예를 들어, 페르소나 데이터 세트는 10억 개의 성격으로 구성된 데이터 세트입니다. 인간과 같은, 배경과 같은. 오, 맞아요, 이걸 봤어요. 네, 저는 교사이거나 예술가입니다. 저는 여기 살고, 이런 일을 하고, 등등, 그리고 그것은 가상의 인간 배경에 대한 작은 문단과 같습니다. 그리고 합성 데이터 생성을 할 때 하는 일은 이 작업을 완료하고 이런 방식으로 하는 것뿐만 아니라 이 사람에게 설명하고 이 정보를 입력하고 이제 더 많은 공간을 탐색하도록 강요하고 엔트로피를 얻는다고 상상하는 것입니다. 그래서 저는 엔트로피를 주입하고 분포를 유지하는 데 매우 조심해야 한다고 생각합니다. 그리고 그것이 어려운 부분인데, 제 생각에 사람들이 일반적으로 충분히 감사하지 않는 것 같습니다. 그래서 저는 기본적으로 합성 데이터가 절대적으로 필요하다고 생각합니다. 미래에 데이터가 고갈되지 않을 것이라는 것이 제 인상입니다. 저는 조심해야 한다고 생각합니다. 이 연구를 통해 인간의 인지에 대해 지금 무엇을 배우고 있다고 생각하십니까? 우리가 배우고 있는지 모르겠습니다. 예를 들어, 원하는 추론 흔적의 모양을 파악하는 것이 실제로 뇌가 작동하는 방식을 이해하는 데 유익하다고 주장할 수 있습니다. 저는 그러한 비유에 조심할 것이지만 일반적으로 매우 다른 종류의 일이라고 생각합니다. 하지만 몇 가지 비유를 그릴 수 있다고 생각합니다. 예를 들어, 저는 변압기가 여러 면에서 인간의 뇌보다 실제로 더 뛰어나다고 생각합니다. 실제로 훨씬 더 효율적인 시스템이라고 생각합니다. 그리고 그들이 인간의 뇌만큼 잘 작동하지 않는 이유는 대부분 데이터 문제이고, 대략적으로 말해서, 1차 근사치라고 말하고 싶습니다. 그리고 실제로, 예를 들어, 변압기입니다. 시퀀스를 기억하는 것은 인간보다 훨씬 뛰어납니다. 시퀀스를 주고 그 시퀀스에서 단일 전방 후방 패스를 수행한 다음, 처음 몇 개의 요소를 주면 나머지 시퀀스를 완료합니다. 그것은 그 시퀀스를 기억했고, 그것에 매우 능숙합니다. 인간에게 시퀀스를 한 번만 보여주면 그것을 기억할 방법이 없습니다. 그래서 변압기는 실제로, 저는 기울기 기반 최적화, 즉 신경망을 훈련하기 위해 항상 수행하는 전방 후방 업데이트가 어떤 면에서 실제로 뇌보다 더 효율적일 가능성이 높다고 생각합니다. 그리고 이 모델이 더 뛰어납니다. 그들은 아직 빛날 준비가 되지 않았습니다. 하지만 인지적 측면에서는 그들이 적절한 입력을 내놓을 수 있을 것 같습니다. 더 나아질 겁니다. 모든 종류의 응용 프로그램에 대한 컴퓨터의 일반적인 사실입니다. 맞죠? 기억을 당신의 요점에 적용해 보겠습니다. 네, 정확히 그렇습니다. 그리고 저는 인간의 뇌가 많은 제약을 가지고 있다고 생각합니다. 작동 기억은 매우 작습니다. 저는 변압기가 훨씬 더 큰 작동 기억을 가지고 있다고 생각하고, 앞으로도 그럴 것입니다. 그들은 훨씬 더 효율적인 학습자입니다. 인간의 뇌는 모든 종류의 제약 하에서 기능합니다. 인간의 범위가 역전파인지는 분명하지 않습니다. 그것이 어떻게 작동하는지도 분명하지 않습니다. 그것은 매우 확률적이고 역동적인 시스템입니다. 그것은 모든 제약을 가지고 있습니다. 그것은 매우 주변적인 조건 등에서 작동합니다. 그래서 저는 우리가 가진 것이 실제로 뇌보다 잠재적으로 더 나을 것이라고 생각하지만, 아직 거기에 이르지 못했습니다. 시간이 지남에 따라 다양한 AI 시스템으로 인간의 증강에 대해 어떻게 생각하십니까? 그것이 가능한 방향이라고 생각하십니까? 그럴 가능성이 낮다고 생각하십니까? 증강, AI 모델을 사용한 사람들의 증강? 물론이죠. 하지만 어떤 의미에서? 어쩌면요. 저는 일반적으로, 절대적으로 그렇다고 생각합니다. 왜냐하면, 제가 말하고자 하는 것은, 도구로 사용하는 추상적인 버전이 있기 때문입니다. 그것은 외부 버전입니다. 합병 시나리오가 있습니다. 많은 사람들이 결국 이야기하게 됩니다. 저는 우리가 이미 일종의 합병을 하고 있다는 것을 의미합니다. 문제는, I O 병목 현상이 있지만, 대부분은 손끝에 있습니다. 이러한 모델 중 하나가 있다면. 네, 하지만 그것은 약간 다릅니다. 왜냐하면, 사람들은 기술적 도구가 단지 인간 능력의 확장일 뿐이라는 주장을 40, 50년 동안 해왔기 때문입니다. 맞아요. 네. 컴퓨터는 인간 정신의 자전거 등입니다. 정확히 그렇습니다. 하지만 AI 커뮤니티의 하위 집단은 예를 들어, 미래의 AI 또는 다른 것과의 잠재적 갈등을 포용하는 방식이 어떤 형태를 통해 이루어질 것이라고 생각합니다. 네. 신경링크 피치 등등과 같습니다. 정확히 그렇습니다. 네. 이 합병이 어떤 모습일지는 아직 모르겠지만, 도구 사용에 대한 I o를 줄이고 싶어한다는 것은 확실히 알 수 있습니다. 그리고 저는 이것을 신피질 위에 구축하는 동안 외피질과 같은 것으로 봅니다. 맞습니다. 그리고 그것은 단지 다음 층일 뿐이고, 그것은 단지 클라우드 등에 있는 것으로 밝혀졌습니다. 하지만 그것은 뇌의 다음 층입니다. 네. 2000년대 초반의 Accelerondo 책에는 기본적으로 모든 것이 뇌에 계산적으로 부착된 고글 세트에 입증되어 착용하는 버전이 있습니다. 그리고 그것을 잃으면 페르소나나 기억의 일부를 잃는 것처럼 느껴질 것입니다. 저는 그것이 매우 가능성이 높다고 생각합니다. 그리고 오늘날 휴대전화는 이미 거의 끝났고, 더 악화될 것이라고 생각합니다. 기술적인 것들을 당신에게서 치워 놓으면, 당신은 본질적으로 벌거벗은 인간과 같습니다. 글쎄요, 당신은 당신의 지능의 일부를 잃습니다. 그것은 매우 불안을 유발합니다. 아주 간단한 예가 바로 지도입니다. 그렇죠. 그래서 많은 사람들이 지금 도시를 잘 돌아다닐 수 없다는 걸 알게 되었어요. 항상 방향을 바꿔가며 사용하기 때문이죠. 그리고 예를 들어, 범용 번역기가 있다면, 그렇게 멀지 않은 미래일 거라고 생각하지만, 그냥 물건을 치우면 영어를 못하는 사람과 대화할 수 없게 될 거예요. 저는 뇌의 그 부분을 추가 연구를 위해 재활용하는 데 매우 편안해요. 잡지를 들고 잡지를 넘기려고 하는 아이의 영상을 보셨는지 모르겠네요. 그렇죠. 제게 흥미로운 점은 이 아이가 자연에 무엇이 있는지, 자연 위에 있는 기술이 무엇인지 이해하지 못한다는 거예요. 그렇죠. 너무 투명하게 만들었거든요. 사람들이 도구를 가정하기 시작한 다음 도구를 없애면 사람들이 무엇이 기술이고 무엇이 아닌지 모른다는 걸 깨닫는 것과 비슷할 것 같아요. 만약 당신이 항상 모든 사람을 통역하거나, 당신을 위해 그런 일을 하는 이런 것을 착용한다면, 사람들은 기본적인 인지 능력을 잃고, 본래 존재하지 않을 수도 있습니다. 우리는 전문화될 것입니다. 스페인어를 하는 사람들의 말을 이해할 수 없습니다. 도대체 뭐야? 아니면, 디즈니에서처럼 사물에 가면 모든 사물이 살아 있습니다. 그리고 저는 우리가 잠재적으로 그런 세상에 도달할 것이라고 생각합니다. 왜 사물과 대화할 수 없는가? 예를 들어, 오늘날 이미 알렉사와 대화할 수 있고, 그녀에게 무언가를 요청할 수 있습니다. 그렇죠, 그렇죠. 저는 그런 장난감 회사들을 봤어요. 그들은 기본적으로 LLM과 아이와 상호작용할 수 있는 장난감을 내장하려고 하는 거죠. 그렇죠. 문을 열 때 그냥 열라고 말할 수 없다는 게 이상하지 않나요? 뭐야? 그런 것의 또 다른 좋아하는 예입니다. 데몰리션 맨이나 아이로봇을 보셨는지 모르겠네요. 사람들은 그냥 사물과 대화할 수 없다는 생각을 비웃습니다. 그리고 뭐야? 우리는 엑소코르텍스에 대해 이야기하고 있어요. 그것은 접근성을 민주화하는 데 근본적으로 중요한 것처럼 느껴집니다. LLM 연구에서 일어나는 일의 현재 시장 구조는 어떻게 생각하세요? 실제로 차세대 진행 훈련에 기회가 있는 소수의 대형 연구실이 있습니다. 그것이 미래에 사람들이 접근할 수 있는 것으로 어떻게 변환되나요? 그러니까 당신이 암시하는 것은 아마도 생태계의 상태일 겁니다. 그렇죠. 그래서 우리는 몇 개의 폐쇄된 플랫폼으로 이루어진 일종의 과점 구조를 가지고 있고, 그 뒤에는 일종의 개방형 플랫폼이 있습니다. 그래서, 메탈 아마, 등등. 그리고 이것은 일종의 오픈 소스 생태계를 반영하는 것입니다. 저는 이런 것들이 시작될 때, 우리가 그것을 외피질이라고 생각하기 시작할 때라고 생각합니다. 그래서 암호 화폐에는 당신의 열쇠가 아니라, 당신의 열쇠가 아니라, 당신의 것이 아니라는 말이 있습니다. 네. 만약 그것이 당신의 무게가 아니라, 당신의 두뇌가 아니라면, 그것은 흥미로운 일인데, 왜냐하면 회사가 효과적으로 당신의 외피질을 통제하고 있기 때문에, 그것의 큰 부분이 일종의 침습적 느낌을 받기 시작하기 때문입니다. 이것이 제 외피질이라면, 저는 사람들이 소유권에 대해 훨씬 더 신경 쓸 것이라고 생각합니다. 네. 네, 당신은 당신이 당신의 두뇌를 빌리고 있다는 것을 깨닫습니다. 마치, 당신의 두뇌를 빌리는 것이 많은 것처럼 보입니다. 사고 실험은, 당신은 더 나은 두뇌를 빌리기 위해 소유권과 통제를 포기할 의향이 있습니까? 왜냐하면 저는 그럴 것이기 때문입니다. 네. 그래서 저는 그것이 트레이드 오프라고 생각합니다. 어떻게 작동하는지 볼 것입니다. 하지만 아마도 기본적으로 폐쇄형 버전을 사용할 수 있을 것입니다. 왜냐하면 그것들이 놀랍기 때문입니다. 하지만 다양한 시나리오에서 폴백이 있습니다. 그리고 저는 그것이 오늘날의 상황과 비슷하다고 생각합니다. 일부 폐쇄형 소스 공급자의 API가 다운되더라도 사람들은 예를 들어 그들이 완전히 제어하는 개방형 생태계에 폴백을 구현하기 시작하고, 그것으로 힘을 얻습니다. 그래서 아마도 그것은 무슨 일이 일어나더라도 오픈 소스에 의지할 때 뇌에 보이는 것의 확장일 뿐입니다. 하지만 대부분의 경우 실제로. 그래서 오픈 소스가 계속 발전하는 것이 매우 중요합니다. 저는 100% 그렇다고 생각합니다. 이것은 명백한 요점이나 사람들이 지금 당장 동의할 수 있는 것은 아니지만, 저는 100%라고 생각합니다. 제가 조금 궁금한 게 하나 있는데, 매개변수 크기나 어떻게 생각하든 어떤 의미에서든 얻을 수 있는 가장 작은 성능 모델은 무엇인가 하는 것입니다. 그리고 당신의 관점에 대해 조금 궁금합니다. 증류 소형 모델에 대해 많이 생각해 보셨나요? 놀라울 정도로 작을 수 있다고 생각하고, 현재 모델은 중요하지 않은 것을 기억하는 데 엄청난 용량을 낭비하고 있다고 생각합니다. 예를 들어, 샤 해시를 기억하고, 고대처럼 기억하는데, 데이터 세트가 최상으로 큐레이팅되지 않았기 때문입니다. 네, 정확히 그렇습니다. 그리고 저는 이런 일이 사라질 것이라고 생각합니다. 그리고 우리는 인지적 핵심에 도달해야 한다고 생각합니다. 그리고 인지적 핵심은 매우 작을 수 있고, 생각하는 것일 뿐이며, 정보를 찾아야 할 경우 다양한 도구를 사용하는 방법을 알고 있습니다. 매개변수가 30억 개 정도인가요? 매개변수가 200억 개인가요? 10억 개, 10억 개면 충분하다고 생각합니다. 아마 그 지점에 도달할 것입니다. 그리고 모델은 매우, 매우 작을 수 있습니다. 그리고 제 생각에 그것들이 아주 작을 수 있는 이유는 근본적으로, 제 생각에, 증류가 작동하는 것처럼, 제가 증류가 놀라울 정도로 잘 작동한다고 말할 수 있는 유일한 것입니다. 증류는 정말 큰 모델이나 엄청난 양의 컴퓨터 또는 그런 것과 같은 것을 얻어서 아주 작은 모델을 감독하고, 실제로 많은 기능을 아주 작은 것에 채울 수 있는 것입니다. 그것에 대한 수학적 표현이나 정보, 이론적 공식화가 있습니까? 그것이 무엇인지에 대해 계산할 수 있어야 한다고 느껴지기 때문입니다. 네, 그것에 대해 생각할 수 있는 한 가지 방법은 우리가 작업하고 있는 인터넷 데이터 세트로 돌아가는 것과 같습니다. 인터넷은 0.001%의 인지와 99.99%의 정보입니다. 쓰레기와 같습니다. 네, 그리고 저는 그 대부분이 사고하는 부분에 유용하지 않다고 생각하고, 네, 제 생각에 질문을 구성하는 또 다른 방법은 모델 크기에 비해 인지 능력을 수학적으로 표현하는 방법이 있는지, 아니면 인지를 달성하려는 것에 대한 최소 또는 최대값과 관련하여 어떻게 포착하는지, 그리고 그것을 표현할 좋은 방법이 없는지입니다. 저는 10억 개의 매개변수가 일종의 좋은 인지적 핵심을 제공한다고 생각합니다. 아마 맞을 것 같습니다. 10억 개도 너무 많다고 생각합니다. 모르겠습니다. 지켜봐야겠습니다. 매우 흥미로운데, 에지 디바이스와 클라우드의 문제이고, 또한 모델을 사용하는 데 드는 이러한 원시 비용과 모든 것을 생각해보면 말입니다. 네, 매우 흥미롭습니다. 맞죠. 하지만 10억 개 미만의 매개변수에서는 로컬 디바이스에도 외피질 피질이 있습니다. 네. 그러면 아마도 단일 모델이 아닐 것입니다. 맞죠? 저는 이것이 실제로 어떻게 전개될지 생각하는 것이 흥미롭습니다. 왜냐하면 병렬화의 이점을 얻고 싶어한다고 생각하기 때문입니다. 순차적인 프로세스가 없습니다. 병렬 프로세스를 원합니다. 그리고 저는 회사가 어느 정도 작업의 병렬화와 비슷하다고 생각하지만, 회사에는 계층 구조가 있습니다. 왜냐하면 그것이 조직 내에서 정보를 처리하고 축소해야 하는 한 가지 방법이기 때문입니다. 그래서 저는 LLM을 위한 회사가 생길 것이라고 생각합니다. 저는 여러분이 다양한 고유한 도메인에 특화된 다양한 역량의 모델을 가질 가능성이 있다고 생각합니다. 아마도 프로그래머 등이 있을 것이고, 실제로 매우 큰 면에서 회사와 닮아갈 것입니다. 그래서 여러분은 프로그래머와 프로그램 관리자, 그리고 LLM의 유사한 종류의 역할이 병렬로 작업하고 함께 모여 여러분을 대신하여 계산을 조율하게 될 것입니다. 그래서 아마 생각하는 것이 옳지 않을 수도 있습니다. 그것은 무리와 더 비슷합니다. 저는 그것이 생태계처럼 느껴진다고 말하고 싶지 않습니다. 그것은 생물학적 생태계와 같습니다. 우리는 전문적인 역할과 틈새 시장을 가지고 있고, 저는 우리가 그것을 닮아가기 시작할 것이라고 생각합니다. 문제의 어려움에 따라 무리의 다른 부분으로 자동으로 에스컬레이션됩니다. 그래서 CEO는 정말 뛰어난 클라우드 모델과 같을 수 있지만, 근로자는 훨씬 저렴할 수 있고, 심지어 오픈 소스 모델이거나 그런 것일 수도 있습니다. 그리고 제 비용 함수는 귀하의 비용 함수와 다릅니다. 네. 그래서 흥미로울 수 있습니다. 당신은 오픈 AI를 떠났습니다. 당신은 교육에 종사하고 있습니다. 당신은 항상 교육자였습니다. 왜 이런 일을 할까요? 저는 제가 항상 교육자였고, 배우는 것을 좋아하고 가르치는 것을 좋아합니다. 그래서 그것은 제가 오랫동안 매우 열정을 가지고 있었던 공간과 같습니다. 그리고 또 다른 것은, 저를 이끄는 거시적 그림 중 하나는 AI에서 많은 활동이 있고, 그 대부분이 사람들을 대체하거나 대체하는 것이라고 생각한다는 것입니다. 저는 그것이 사람들을 밀어내는 주제라고 말하고 싶습니다. 하지만 저는 항상 사람들에게 힘을 실어주는 것에 더 관심이 있습니다. 그리고 저는 제가 높은 수준의 팀 인간이라고 생각하고, AI가 사람들에게 힘을 실어 줄 수 있는 일에 관심이 있습니다. 그리고 사람들이 자동화의 편에 서는 미래는 원하지 않습니다. 저는 사람들이 매우 강력한 상태에 있기를 바라며, 그들이 놀랍고, 오늘날보다 훨씬 더 놀랍기를 바랍니다. 그리고 제가 매우 흥미롭게 생각하는 다른 측면은 모든 과목에 대한 완벽한 튜터가 있다면 사람이 얼마나 멀리 갈 수 있을까 하는 것입니다. 그리고 저는 사람들이 모든 것에 대한 완벽한 커리큘럼이 있다면 정말 멀리 갈 수 있다고 생각합니다. 그리고 저는 우리가 그것을 볼 수 있다고 생각합니다. 아시다시피, 어떤 부자들이 튜터를 두고 실제로 정말 멀리 간다면요. 그래서 저는 우리가 AI로 그것에 접근하거나 심지어 그것을 능가할 수 있다고 생각합니다. 실제로 80년대부터 그에 대한 매우 명확한 문헌이 있습니다. 1:1 튜터링이 사람들이 블룸보다 1표준편차를 더 잘 얻는 데 도움이 된다고 생각합니다. 2인가요? 네, 블룸에 대한 것입니다. 네, 정확히 그렇습니다. 그에 대한 정말 흥미로운 선례가 많이 있습니다. AI 관점에서 실제로 그것을 입증하는 것으로 어떻게 보십니까? 아니면 실제로 도움이 될 첫 번째 유형의 제품은 무엇입니까? 다이아몬드 에이지와 같은 책이 있는데, 그 책에서는 젊은 여성의 그림 입문서와 그런 모든 것에 대해 이야기합니다. 그래서 저는 확실히 그 측면에서 영감을 받았습니다. 그래서 실제로 제가 하고 있는 일은 현재 단일 과정을 구축하는 것이고, 여러분이 배우고 싶을 때 갈 과정과 똑같기를 바랍니다. 아. 기본적으로 문제는 제가 이미 과정을 가르쳤다는 것입니다. 저는 스탠포드에서 231n을 가르쳤고, 그것이 최초의 딥 러닝 수업이었고 꽤 성공적이었습니다. 하지만 문제는 실제로 이러한 수업을 어떻게 확장할 것인가입니다. 타겟 청중이 지구상에 약 80억 명이고 모두 다른 언어를 사용하고 모두 다른 능력 수준을 가지고 있도록 하려면 어떻게 해야 합니까? 그리고 한 명의 교사로는 그에 맞게 확장할 수 없습니까? 청중. 문제는 AI를 사용하여 정말 훌륭한 교사의 확장을 어떻게 할 것인가입니다. 제가 생각하는 방식은 교사가 많은 과정 생성과 커리큘럼을 담당한다는 것입니다. 현재 AI 기능은 좋은 과정을 만들기에 모델이 충분하지 않다고 생각하지만 학생에게 프런트 엔드가 되어 과정을 해석하는 데는 좋다고 생각합니다. 따라서 기본적으로 교사는 사람들에게 가지 않고 교사는 더 이상 프런트 엔드가 아닙니다. 교사는 백엔드에서 과정의 자료를 설계하고 AI는 프런트 엔드이며 모든 언어를 구사할 수 있으며 일종의 과정을 안내합니다. 이것을 ta 유형의 경험이라고 생각해야 할까요? 아니면 여기서 좋은 비유가 아닌가요? 제가 생각하는 한 가지 방법은 Aita입니다. 저는 주로 학생에게 프런트 엔드와 같은 것으로 생각합니다. 그리고 실제로 학생과 상호 작용하고 과정을 안내하는 것입니다. 저는 그것이 오늘날에는 다루기 쉽고 존재하지 않는다고 생각합니다. 그리고 저는 그것을 정말 좋게 만들 수 있다고 생각합니다. 그리고 시간이 지나면서 역량이 증가함에 따라 다양한 방식으로 설정을 리팩토링할 수 있습니다. 저는 오늘날의 AI 역량과 그에 대한 좋은 모델을 찾는 것을 좋아합니다. 그리고 아마도 역량이 없는 많은 회사들이 오늘날 역량이 어디에 있는지 직관적으로 이해하지 못하고 결국 사용 가능한 것보다 너무 앞서 있거나 충분히 야심 차지 않은 것을 구축하게 됩니다. 그래서 저는 이것이 가능한 것의 달콤한 지점이며 또한 정말 흥미롭고 신나는 것이라고 생각합니다. 저는 당신이 말한 것으로 돌아가고 싶은데, 특히 당신의 배경과 우리가 연구에서 정확히 어디에 있는지에 대한 이해에서 비롯된 것으로, 본질적으로 훨씬 더 나은 툴링이 주어진다면 학습 관점에서 인간의 성과 한계가 무엇인지 알 수 없습니다. 그리고 저는 매우 쉬운 비유가 있다고 생각합니다. 우리는 한 달 전에 올림픽을 치렀죠, 맞죠? 그리고 아시다시피, 러너는 최고의 마일 타임이거나 오늘날 어떤 스포츠를 선택하든 10년 전처럼 성능 향상 약물을 제쳐두는 것보다 훨씬 낫습니다. 훈련을 일찍 시작한다고 해서 매우 다른 프로그램이 있습니다. 우리는 훨씬 더 나은 과학적 이해를 가지고 있습니다. 우리는 기술과 장비가 있습니다. 우리가 도구, 커리큘럼으로 시작하면 인간으로서 훨씬 더 나아질 수 있다고 믿는다는 사실은 놀랍습니다. 네, 우리는 가능한 모든 것을 긁어내지도 못했다고 생각합니다. 그래서 저는 기본적으로 두 가지 차원이 있다고 생각합니다. 첫 번째는 세계화 차원입니다. 저는 모든 사람이 정말 좋은 교육을 받기를 원하지만 다른 하나는 한 사람이 얼마나 멀리 갈 수 있는가입니다. 저는 두 가지 모두 매우 흥미롭고 신나는 일이라고 생각합니다. 일반적으로 사람들이 101 학습에 대해 이야기할 때 그들은 적응적 측면에 대해 이야기합니다. 즉, 자신이 있는 수준에서 도전적인 사람입니다. 오늘날 AI로 그것을 할 수 있다고 생각하십니까? 아니면 미래의 무언가입니까? 그리고 오늘날에는 도달 범위와 여러 언어에 관한 것입니다. 그리고 전 세계적으로, 저는 긴 과일이 예를 들어, 다른 언어, 매우 낮게 매달린 과일과 같은 것이라고 생각합니다. 저는 현재 모델이 기본적으로 번역에 정말 뛰어나고, 자료를 타겟팅하여 그 자리에서 번역할 수 있다고 생각합니다. 그래서 저는 많은 것들이 갈망하는 과일이라고 생각합니다. 저는 사람의 배경에 대한 이러한 적응성이 낮게 매달린 과일이 아니라고 생각하지만, 너무 높거나 너무 멀리 떨어져 있다고 생각하지 않습니다. 하지만 그것은 당신이 확실히 원하는 것입니다. 왜냐하면 모든 사람이 같은 배경을 가지고 오는 것은 아니기 때문입니다. 그리고 또한 정말 도움이 되는 것은 과거에 다른 학문에 익숙하다면, 당신이 아는 것, 즉 비유를 만드는 것이 정말 유용하고, 그것은 교육에서 매우 강력합니다. 그래서 그것이 확실히 당신이 활용하고자 하는 사명입니다. 하지만 저는 그것이 명확하지 않고 어딘가가 필요한 지점에 도달하기 시작한다고 생각합니다. 저는 그것의 쉬운 버전이 모델을 촉구하는 것만으로 상상할 수 있는 너무 멀지 않다고 생각합니다. 마치, 오, 안녕, 나는 물리학을 알고 있거나 이걸 알고 있고, 아마 뭔가를 얻을 거야. 하지만 내가 말하고자 하는 건 실제로 작동하는 무언가이지, 가끔 데모하고 작업할 수 있는 무언가가 아니라는 거야. 그러니까 그냥 실제로 정말 작동한다는 거야. 그리고 어떤 면에서는 사람이 할 수 있는 방식으로 말이야. 그래, 그래서 내가 적응성에 대해 물었던 거야. 왜냐하면 사람들은 다른 속도로 배우거나, 다른 사람들이 도전적으로 여기는 특정한 것들을 배우거나, 그 반대의 경우도 마찬가지거든. 그래서 어떻게 그 맥락에 상대적으로 조절하느냐에 대한 작은 질문이야. 그리고 시간이 지나면서 그 사람이 잘하거나 못하는 것을 모델에 다시 도입할 수 있을 거야. 그게 AIHdem의 장점이야. 이런 기능들 중 많은 부분이 금방 나와. 그래서 항상 데모를 받지만, 실제로 제품을 받을 수 있을까? 알겠지? 이런 의미에서 데모는 가깝지만 제품은 멀다고 말할 수 있겠어. 그래서 우리가 이전에 이야기했던 것 중 하나는 정말 흥미롭다고 생각하는데, 연구 커뮤니티에서 일어나는 일종의 혈통인데, 특정 연구실에서 왔고 모두가 서로의 연구실에서 왔다고 수군거립니다. 노벨상 수상자의 상당수가 실제로 이전 노벨상 수상자의 연구실에서 일했다고 생각합니다. 그래서 약간의 전파가 있습니다. 문화, 지식, 브랜딩 또는 다른 것이든. AI 교육 중심 세계에서 혈통을 어떻게 유지합니까? 아니면 중요하지 않습니까? 아니면 네트워크와 지식의 전파에 대한 그러한 측면에 대해 어떻게 생각합니까? 저는 실제로 혈통이 너무 중요한 세상에서 살고 싶지 않습니다. 그렇죠. 그래서 저는 AI가 그 구조를 약간 파괴하는 데 도움이 되기를 바랍니다. 일종의 유한한 희소 자원에 의한 게이트키핑과 같은 느낌이 들며, 오, 이 혈통을 가진 사람의 수는 한정되어 있습니다. 그래서 저는 그것이 약간 그런 측면이라고 생각합니다. 그래서 저는 그것이 그것을 파괴할 수 있기를 바랍니다. 확실히 하나의 조각, 실제 학습, 하나의 조각 혈통, 맞죠? 네. 글쎄요, 그것은 또한 클러스터 효과의 집합이기도 하죠. 왜 모든 AI 커뮤니티가 베이 지역에 있을까요? 아니면 대부분의 핀테크 커뮤니티가 뉴욕에 있을까요? 그래서 저는 많은 부분이 공통 관심사와 신념을 가진 정말 똑똑한 사람들을 클러스터링하고, 그런 다음 공통 핵심에서 전파되고, 흥미로운 방식으로 지식을 공유한다고 생각합니다. 그러한 행동의 많은 부분이 어느 정도 온라인으로 바뀌었고, 특히 젊은 세대의 경우 그렇습니다. 저는 그 중 한 측면이 교육적 측면과 비슷하다고 생각합니다. 오늘날 커뮤니티에 속해 있다면 엄청난 교육과 견습 등을 받게 되는데, 이는 매우 도움이 되고 해당 분야에서 역량 강화 상태로 이끌어줍니다. 다른 측면은 동기를 부여하는 것과 일하고 싶은 것의 문화적 측면과 같다고 생각합니다. 문화는 무엇을 상으로 여기고 무엇을 중시하며 기본적으로 무엇을 숭배하는 것일까요? 예를 들어 학계에서는 h 지수가 있습니다. 모든 사람이 h 지수, 출판한 논문 수 등을 신경 씁니다. 저는 그 커뮤니티에 속해 있었고 그것을 보았습니다. 지금은 다른 곳에 가본 것 같고 모든 커뮤니티에 다른 우상이 있습니다. 사람들이 무엇에 동기를 부여받고 사회적 지위를 어디서 얻고 실제로 무엇이 중요한지에 큰 영향을 미친다고 생각합니다. 저는 또한 슬로바키아에서 자라면서 다른 커뮤니티의 일원이었던 것 같습니다. 또한 매우 다른 환경이었습니다. 캐나다에 있는 것도 매우 다른 환경이었습니다. 중요한 것은 하키였습니다. 미안합니다. 감사합니다. 하키, 그래요, 하키였습니다. 예를 들어, 캐나다에서 토론토 대학교에 다녔는데, 저는 그곳이 기업가 정신이 넘치는 환경이라고 생각하지 않습니다. 회사를 시작해야 한다는 생각조차 하지 못합니다. 사람들이 하는 일이 아닙니다. 그 일을 하는 친구를 알지 못합니다. 존경해야 한다는 것을 모릅니다. 사람들이 모든 창업자에 대한 책을 읽고 그들에 대해 이야기하지 않습니다. 그것은 당신이 열망하거나 신경 쓰는 일이 아닙니다. 그리고 모든 사람이 말하는 것은 인턴십을 어디서 받을 것인가, 그 후에 어디에서 일할 것인가입니다. 그리고 그저 많은 것이 있다는 것을 받아들일 뿐입니다. 고정된 회사 집합이 있어서 그 중에서 골라서 그 중 하나에 맞춰야 합니다. 그게 바로 당신이 존경하는 것이거나 그런 것과 비슷합니다. 그래서 이런 문화적 측면은 매우 강력하고 실제로 지배적인 변수일 수도 있습니다. 왜냐하면 저는 오늘날 교육 측면이 더 쉬운 것 같기 때문입니다. 이미 많은 것들이 제공되고 있습니다. 그래서 저는 주로 당신이 속한 문화적 측면이라고 생각합니다. 그래서 이 지점에서, 몇 주 전에 당신과 제가 이야기했던 것 중 하나는, 당신도 이에 대해 온라인에 게시한 것 같지만, 학습과 오락의 차이가 있고 학습은 실제로 어려운 것으로 여겨집니다. 그리고 저는 그것이 지위와 같은 질문과 관련이 있다고 생각합니다. 지위가 큰 동기 부여 요인인 것, 누가 우상인가와 같은 질문입니다. 이런 시스템을 통해 동기 부여 측면에서 얼마나 많은 것을 바꿀 수 있다고 생각하십니까? 그것이 방해 요소와 같다면, 사람들이 자신의 능력에 따라 가능한 한 시퀀스에서 멀리 갈 수 있도록 리소스를 제공하는 데 집중하고 있습니까? 역사상 다른 어떤 시점보다 더 영감을 주는 것 같은데요? 아니면 실제로 얼마나 많은 사람들이 배우고 싶어 하는지, 아니면 적어도 스스로 그 길로 이끌고 싶은 건가요? 원한다는 단어는 의미가 담겨 있죠. 저는 배우기가 훨씬 더 쉬워지기를 바란다고 말하고 싶습니다. 그리고 사람들이 배우고 싶어하지 않을 수도 있습니다. 예를 들어 오늘날 사람들은 실용적인 이유로 배우고 싶어하죠? 일자리를 구하고 싶어하는 것 같은데요. 완전히 말이 되죠. 그래서 AGI 이전 사회에서는 교육이 유용하고 사람들이 그것에 동기를 부여받을 것이라고 생각합니다. 왜냐하면 그들은 경제적으로 사다리를 오르고 있기 때문이라고 생각합니다. 하지만 AGI 이후 사회에서는 우리 모두가 사회로 가고 있습니다. 저는 교육이 훨씬 더 큰 의미에서 엔터테인먼트라고 생각합니다. 성공적인 결과 교육을 포함합니다. 맞아요. 그저 내용이 흘러가는 대로 두는 것이 아닙니다. 네, 그렇게 생각합니다. 이해, 학습, 새로운 지식에 기여할 수 있는 능력, 또는 여러분이 정의하는 대로 결과가 됩니다. 200년, 300년 전으로 돌아가면 과학을 하는 사람들이 귀족이거나 부유한 사람들이었던 것은 우연이 아니라고 생각합니다. 우리 모두 귀족이 될 것입니다. 앙드레와 함께 배우세요. 네, 저는 그것이 당신이 앞서 인용한 것과 매우 유사하다고 생각합니다. 저는 무언가를 배우는 것이 헬스장에 가는 것과 비슷하다고 생각하지만, 뇌를 위해서요. 맞죠? 헬스장에 가는 것과 같은 느낌이에요. 헬스장에 가는 것은 재밌죠. 사람들은 리프팅을 좋아해요. 헬스장에 가지 않는 사람도 있어요. 아니요, 아니요, 가는 사람도 있지만, 헬스장에 가요. 노력이 필요해요. 네, 네, 노력이 필요하지만 노력이 필요해요. 하지만 재미있기도 해요. 그리고 보상도 있어요. 여러 면에서 자신에 대해 기분이 좋아지죠. 맞아요. 그리고 저는 교육이 기본적으로 그것과 동일하다고 생각해요. 그래서 제가 교육은 재밌어서는 안 된다고 말할 때 의미하는 바가 바로 그것이에요. 제 말은, 재밌긴 하지만 특정한 종류의 재미인 것 같아요. 맞아요. 저는 GI 이후의 세계에서 사람들이 실제로 헬스장에 많이 간다는 것을 바랄 수 있다고 생각합니다. 신체적으로뿐만 아니라 정신적으로도요. 그리고 우리가 고학력자로서 존경하는 것, 그리고 아시다시피, 그냥. 그냥. 네. 유레카에 대한 마지막 질문을 하나 드릴 수 있나요? 사람들이 흥미로울 것이라고 생각해서요. 첫 번째 과정의 대상은 누구인가요? 첫 번째 과정의 대상은요. 저는 주로 이것을 학부 수준 과정으로 생각합니다. 기술 분야에서 학부 과정을 밟는다면 이상적인 대상이라고 생각합니다. 지금 우리가 보고 있는 것은 학교를 마치고 졸업하고 직장에 가는 구식 교육 개념이라고 생각합니다. 그렇죠. 분명히 이런 개념은 완전히 무너질 것입니다. 특히 기술이 매우 매우 빠르게 변하면서 사람들이 훨씬 더 자주 학교로 돌아올 정도로 사회가 빠르게 변하는 상황에서 더욱 그렇습니다. 그래서 학부 수준과 비슷하지만, 저는 그 수준에 있는 모든 사람, 모든 연령대가 범위에 포함된다고 말하고 싶습니다. 연령대가 매우 다양할 것이라고 생각합니다. 예를 들어, 기술적인 사람들이 대부분이며, 대부분 실제로 상당히 이해하고 싶어합니다. 언제 수업을 들을 수 있을까요? 올해는 늦었으면 좋겠다고 생각했습니다. 산만해지는 일이 많이 있지만, 아마도 내년 초가 타임라인일 것 같습니다. 네, 매우, 매우 훌륭하게 만들려고 노력하고 있습니다. 그저 거기에 도달하는 데 시간이 걸릴 뿐입니다. 사실, 마지막 질문이 하나 있는데, 그것과 관련이 있습니다. 오늘날 어린 자녀가 있다면, 무엇을 공부해야 유용할까요? 유용한 미래를 가질 수 있을까요? 제 생각에는 정답이 있고, 정답은 대부분 수학, 물리, cs와 같은 분야입니다. 그렇게 말하는 이유는 그것이 사고 능력에 도움이 된다고 생각하기 때문입니다. 제 의견은 가장 뛰어난 사고 기술 핵심입니다. 물론 저는 특정한 배경이 있고, 등등이 있습니다. 그래서 저는 이렇게 생각할 수 있지만, 그것은 그저 제 관점일 뿐입니다. 저는 물리학 수업과 다른 모든 수업을 수강한 것이 제 생각 방식을 형성했다고 생각합니다. 그리고 저는 그것이 일반적으로 문제 해결에 매우 유용하다고 생각합니다. 그리고 만약 우리가 AGI 이전의 세상에서 이것이 유용할 것이고, AGI 이후에도 여전히 임의의 능력으로 기능할 수 있는 강화된 인간이 필요합니다. 그래서 저는 이것이 사람들이 해야 할 올바른 답이고 그들이 해야 할 일이며, 유용하거나 좋은 것이라고 생각합니다. 그래서 저는 그것이 올바른 답이라고 생각합니다. 그리고 저는 다른 많은 것들이 조금 후에 덧붙일 수 있지만, 사람들이 많은 시간과 많은 종류의 주의와 시간을 갖는 중요한 시기에는 이런 종류의 간단한 조작, 무거운 작업과 작업 부하에 주로 사용해야 한다고 생각합니다. 기억력이 강한 작업과 작업 부하에 사용해서는 안 됩니다. 네, 저는 수학 학위를 받았고, 제 뇌에 새로운 홈이 새겨지는 것 같은 느낌이 들었어요. 나중에 새기기가 더 어려운 홈이에요. 물론 다른 것도 많이 넣었을 거예요. 다른 모든 학문에 반대하는 건 아니에요. 사실 다양한 것을 갖는 게 아름다운 일이라고 생각하지만, 80%는 이런 것이어야 한다고 생각해요. 우리는 도구에 비해 효율적으로 기억하지 못해요. 이렇게 해주셔서 감사합니다. 정말 재밌어요. 네. 네. 여기 와서 정말 좋아요. Twitter opryerspod에서 저희를 찾으세요. 저희 얼굴을 보고 싶으시다면 YouTube 채널을 구독하세요. Apple Podcasts, Spotify 또는 청취하는 곳에서 쇼를 팔로우하세요. 그러면 매주 새로운 에피소드를 받고 이메일을 등록하거나 Know dash pryors.com에서 모든 에피소드의 대본을 찾을 수 있습니다.