diff --git "a/docs/karpathy-lex-pod/karpathy-pod.txt" "b/docs/karpathy-lex-pod/karpathy-pod.txt" new file mode 100644--- /dev/null +++ "b/docs/karpathy-lex-pod/karpathy-pod.txt" @@ -0,0 +1,2822 @@ +some kind of a crazy quantum mechanical system that somehow gives you buffer overflow, somehow +gives you a rounding error in the floating point. +Synthetic intelligences are kind of like the next stage of development. +And I don't know where it leads to. +Like at some point, I suspect the universe is some kind of a puzzle. +These synthetic AIs will uncover that puzzle and solve it. +The following is a conversation with Andrei Kapathe, previously the director of AI at +Tesla, and before that at OpenAI and Stanford. +He is one of the greatest scientists, engineers, and educators in the history of artificial +intelligence. +This is the Lex Friedman podcast. +To support it, please check out our sponsors. +And now, dear friends, here's Andrei Kapathe. +What is a neural network? +And why does it seem to do such a surprisingly good job of learning? +What is a neural network? +It's a mathematical abstraction of the brain. +I would say that's how it was originally developed. +At the end of the day, it's a mathematical expression. +It's a fairly simple mathematical expression when you get down to it. +It's basically a sequence of matrix multipliers, which are really dot products mathematically, +and some non-linearity is thrown in. +It's a very simple mathematical expression, and it's got knobs in it. +Many knobs. +Many knobs. +These knobs are loosely related to the synapses in your brain. +They're trainable, they're modifiable. +The idea is we need to find the setting of the knobs that makes the neural net do whatever +you want it to do, like classify images and so on. +There's not too much mystery, I would say, in it. +You might think that you don't want to endow it with too much meaning with respect to the +brain and how it works. +It's really just a complicated mathematical expression with knobs, and those knobs need +a proper setting for it to do something desirable. +Yeah, but poetry is just a collection of letters with spaces, but it can make us feel a certain +way. +In that same way, when you get a large number of knobs together, whether it's inside the +brain or inside a computer, they seem to surprise us with their power. +I think that's fair. +I'm underselling it by a lot because you definitely do get very surprising emergent behaviors +out of these neural nets when they're large enough and trained on complicated enough problems, +like say for example, the next word prediction in a massive data set from the internet. +These neural nets take on pretty surprising magical properties. +Yeah, I think it's interesting how much you can get out of even very simple mathematical +formalism. +When your brain right now is talking, is it doing next word prediction? +Or is it doing something more interesting? +Well, it's definitely some kind of a generative model that's a GPT-like and prompted by you. +So you're giving me a prompt and I'm kind of responding to it in a generative way. +And by yourself perhaps a little bit? +Are you adding extra prompts from your own memory inside your head? +It definitely feels like you're referencing some kind of a declarative structure of memory +and so on, and then you're putting that together with your prompt and giving away some answer. +How much of what you just said has been said by you before? +Nothing basically, right? +No, but if you actually look at all the words you've ever said in your life and you do a +search, you'll probably have said a lot of the same words in the same order before. +Yeah, could be. +I mean, I'm using phrases that are common, et cetera, but I'm remixing it into a pretty +unique sentence at the end of the day. +But you're right, definitely there's a ton of remixing. +Why, you didn't, it's like Magnus Carlsen said, I'm rated 2,900 whatever, which is pretty +decent. +I think you're talking very, you're not giving enough credit to neural nets here. +Why do they seem to, what's your best intuition about this emergent behavior? +I mean, it's kind of interesting because I'm simultaneously underselling them, but I also +feel like there's an element to which I'm over, like it's actually kind of incredible +that you can get so much emergent magical behavior out of them despite them being so +simple mathematically. +So I think those are kind of like two surprising statements that are kind of juxtaposed together. +And I think basically what it is, is we are actually fairly good at optimizing these neural +nets. +And when you give them a hard enough problem, they are forced to learn very interesting +solutions in the optimization. +And those solution basically have these emergent properties that are very interesting. +There's wisdom and knowledge in the knobs. +And so this representation that's in the knobs, does it make sense to you intuitively +the large number of knobs can hold the representation that captures some deep wisdom about the data +it has looked at? +It's a lot of knobs. +It's a lot of knobs. +And somehow, you know, so speaking concretely, one of the neural nets that people are very +excited about right now are GPTs, which are basically just next word prediction networks. +So you consume a sequence of words from the internet and you try to predict the next word. +And once you train these on a large enough data set, you can basically prompt these neural +nets in arbitrary ways and you can ask them to solve problems and they will. +So you can just tell them, you can make it look like you're trying to solve some kind +of a mathematical problem and they will continue what they think is the solution based on what +they've seen on the internet. +And very often those solutions look very remarkably consistent, look correct potentially. +Do you still think about the brain side of it? +So as neural nets is an abstraction or mathematical abstraction of the brain, you still draw wisdom +from the biological neural networks or even the bigger question. +So you're a big fan of biology and biological computation. +What impressive thing is biology doing to you that computers are not yet? +That gap? +I would say I'm definitely on, I'm much more hesitant with the analogies to the brain than +I think you would see potentially in the field. +And I kind of feel like certainly the way neural networks started is everything stemmed +from inspiration by the brain. +But at the end of the day, the artifacts that you get after training, they are arrived +at by a very different optimization process than the optimization process that gave rise +to the brain. +And so I think, I kind of think of it as a very complicated alien artifact. +It's something different. +The brain? +I'm sorry, the neural nets that we're training. +They are complicated alien artifact. +I do not make analogies to the brain because I think the optimization process that gave +rise to it is very different from the brain. +There was no multi-agent self-play kind of setup in evolution. +It was an optimization that is basically what amounts to a compression objective on a massive +amount of data. +Okay. +So artificial neural networks are doing compression and biological neural networks are not really +doing anything. +They're an agent in a multi-agent self-play system that's been running for a very, very +long time. +That said, evolution has found that it is very useful to predict and have a predictive +model in the brain. +And so I think our brain utilizes something that looks like that as a part of it, but +it has a lot more gadgets and gizmos and value functions and ancient nuclei that are all +trying to make you survive and reproduce and everything else. +And the whole thing through embryogenesis is built from a single cell. +It's just the code is inside the DNA and it just builds it up like the entire organism +with arms and the head and legs. +And it does it pretty well. +It should not be possible. +So there's some learning going on. +There's some kind of computation going through that building process. +I don't know where, if you were just to look at the entirety of history of life on earth, +what do you think is the most interesting invention? +Is it the origin of life itself? +Is it just jumping to eukaryotes? +Is it mammals? +Is it humans themselves, almost sapiens? +The origin of intelligence or highly complex intelligence? +Or is it all just a continuation of the same kind of process? +Certainly I would say it's an extremely remarkable story that I'm only briefly learning about +recently. +It's a way from actually like you almost have to start at the formation of earth and all +of its conditions and the entire solar system and how everything is arranged with Jupiter +and moon and the habitable zone and everything. +And then you have an active earth that's turning over material. +And then you start with a bio genesis and everything. +So it's all like a pretty remarkable story. +I'm not sure that I can pick like a single unique piece of it that I find most interesting. +I guess for me as an artificial intelligence researcher, it's probably the last piece. +We have lots of animals that are not building technological society, but we do. +And it seems to have happened very quickly. +It seems to have happened very recently. +And something very interesting happened there that I don't fully understand. +I almost understand everything else, I think intuitively, but I don't understand exactly +that part and how quick it was. +Both explanations will be interesting. +One is that this is just a continuation of the same kind of process. +There's nothing special about humans. +That would be deeply understanding. +That would be very interesting that we think of ourselves as special, but it was obvious. +It was already written in the code that you would have greater and greater intelligence +emerging. +And then the other explanation, which is something truly special happened, something like a rare +event, whether it's like crazy rare event like Space Odyssey. +What would it be? +See, if you say like the invention of fire or the, as Richard Wrangham says, the beta +males deciding a clever way to kill the alpha males by collaborating. +So just optimizing the collaboration, the multi-agent aspect of the multi-agent and +that really being constrained on resources and trying to survive the collaboration aspect +is what created the complex intelligence. +But it seems like it's a natural algorithm, the evolutionary process. +What could possibly be a magical thing that happened, like a rare thing that would say +that humans are actually human level intelligence, actually a really rare thing in the universe? +Yeah, I'm hesitant to say that it is rare by the way, but it definitely seems like it's +kind of like a punctuated equilibrium where you have lots of exploration and then you +have certain leaps, sparse leaps in between. +So of course, like origin of life would be one, DNA, sex, eukaryotic life, the endosymbiosis +event where the archaeon ate all bacteria, just the whole thing. +And then of course, emergence of consciousness and so on. +So it seems like definitely there are sparse events where a massive amount of progress +was made, but yeah, it's kind of hard to pick one. +So you don't think humans are unique? +I've got to ask you, how many intelligent alien civilizations do you think are out there? +And is their intelligence different or similar to ours? +Yeah, I've been preoccupied with this question quite a bit recently, basically the Fermi +paradox and just thinking through. +And the reason actually that I am very interested in the origin of life is fundamentally trying +to understand how common it is that there are technological societies out there in space. +And the more I study it, the more I think that there should be quite a lot. +Why haven't we heard from them? +Because I agree with you. +It feels like I just don't see why what we did here on Earth is so difficult to do. +Yeah, and especially when you get into the details of it, I used to think origin of life +was very, it was this magical rare event, but then you read books like, for example, +Nic Lane, The Vital Question, Life Ascending, etc. +And he really gets in and he really makes you believe that this is not that rare. +Basic chemistry. +You have an active Earth and you have your alkaline vents and you have lots of alkaline +waters mixing with the ocean and you have your proton gradients and you have the little +porous pockets of these alkaline vents that concentrate chemistry. +And basically as he steps through all of these little pieces, you start to understand that +actually this is not that crazy. +You could see this happen on other systems. +And he really takes you from just a geology to primitive life and he makes it feel like +it's actually pretty plausible. +And also like the origin of life was actually fairly fast after formation of Earth. +If I remember correctly, just a few hundred million years or something like that after +basically when it was possible, life actually arose. +So that makes me feel like that is not the constraint, that is not the limiting variable +and that life should actually be fairly common. +And then where the drop-offs are is very interesting to think about. +I currently think that there's no major drop-offs basically, and so there should be quite a +lot of life. +And basically where that brings me to then is the only way to reconcile the fact that +we haven't found anyone and so on is that we just can't, we can't see them. +We can't observe them. +Just a quick brief comment. +Nick Lane and a lot of biologists I talked to, they really seem to think that the jump +from bacteria to more complex organisms is the hardest jump. +The eukaryotic life basically. +Yeah, which I don't, I get it. +They're much more knowledgeable than me about like the intricacies of biology, but that +seems like crazy. +How many single cell organisms are there? +And how much time you have? +Surely, it's not that difficult. +And a billion years is not even that long of a time really. +Just all these bacteria under constrained resources battling it out. +I'm sure they can invent more complex. +I don't understand, it's like how to move from a hello world program to like invent a +function or something like that. +I don't. +Yeah. +So I don't, yeah, so I'm with you. +I just feel like I don't see any, if the origin of life, that would be my intuition, that's +the hardest thing. +But if that's not the hardest thing, because it happens so quickly, then it's got to be +everywhere. +And yeah, maybe we're just too dumb to see it. +Well, it's just, we don't have really good mechanisms for seeing this life. +I mean, by what, how, so I'm not an expert just to preface this, but just from what I +think about it. +I want to meet an expert on alien intelligence and how to communicate. +I'm very suspicious of our ability to find these intelligences out there and to find +these earth, like radio waves, for example, are terrible. +Their power drops off as basically one over R square. +So I remember reading that our current radio waves would not be, the ones that we are broadcasting +would not be measurable by our devices today. +Only like, was it like one tenth of a light year away? +Like not even, basically tiny distance, because you really need like a targeted transmission +of massive power directed somewhere for this to be picked up on long distances. +And so I just think that our ability to measure is not amazing. +I think there's probably other civilizations out there. +And then the big question is why don't they build binomial probes and why don't they interstellar +travel across the entire galaxy? +And my current answer is it's probably interstellar travel is like really hard. +You have the interstellar medium. +If you want to move at close to speed of light, you're going to be encountering bullets along +because even like tiny hydrogen atoms and little particles of dust are basically have +like massive kinetic energy at those speeds. +And so basically you need some kind of shielding. +You need, you have all the cosmic radiation. +It's just like brutal out there. +It's really hard. +And so my thinking is maybe interstellar travel is just extremely hard. +And you have to go very slow. +Like billions of years to build hard? +It feels like, it feels like we're not a billion years away from doing that. +It just might be that it's very, you have to go very slowly potentially as an example +through space. +Right. +As opposed to close to the speed of light. +So I'm suspicious basically of our ability to measure life and I'm suspicious of the +ability to just permeate all of space in the galaxy or across galaxies. +And that's the only way that I can currently see around it. +It's kind of mind blowing to think that there's trillions of intelligent alien civilizations +out there kind of slowly traveling through space to meet each other. +And some of them meet, some of them go to war, some of them collaborate. +Or they're all just independent. +They're all just like little pockets. +Well statistically, if there's like, if it's this trillions of them, surely some of them, +some of the pockets are close enough to get some of them happen to be close enough to +see each other. +And once you see, once you see something that is definitely complex life, like if we see +something, we're probably going to be severe, like intensely aggressively motivated to figure +out what the hell that is and try to meet them. +But what would be your first instinct to try to like at a generational level, meet them +or defend against them? +Or what would be your instinct as a president of the United States and a scientist? +I don't know which hat you prefer in this question. +Yeah, I think the question, it's really hard. +I will say like, for example, for us, we have lots of primitive life forms on earth next +to us. +We have all kinds of ants and everything else and we share space with them. +And we are hesitant to impact on them and to, we are, we're trying to protect them by +default because they are amazing, interesting, dynamical systems that took a long time to +evolve and they are interesting and special. +And I don't know that you want to destroy that by default. +And so I like complex dynamical systems that took a lot of time to evolve. +I think I'd like to, I like to preserve it if I can afford to. +And I'd like to think that the same would be true about the galactic resources and that +they would think that we're kind of incredible, interesting story that took time. +It took a few billion years to unravel and you don't want to just destroy it. +I could see two aliens talking about earth right now and saying, I'm a big fan of complex +dynamical systems. +So I think it was a value to preserve these and who basically are a video game they watch +or show a TV show that they watch. +Yeah, I think you would need like a very good reason, I think, to destroy it. +Like why don't we destroy these ant farms and so on? +Because we're not actually like really in direct competition with them right now. +We do it accidentally and so on, but there's plenty of resources. +And so why would you destroy something that is so interesting and precious? +Well from a scientific perspective, you might probe it. +You might interact with it lightly. +You might want to learn something from it, right? +So I wonder there could be certain physical phenomena that we think is a physical phenomena, +but it's actually interacting with us to like poke the finger and see what happens. +I think it should be very interesting to scientists, other alien scientists, what happened here. +And you know, it's a, what we're seeing today is a snapshot. +Basically it's a result of a huge amount of computation over like billion years or something +like that. +So it could have been initiated by aliens. +This could be a computer running a program. +Like when, okay, if you had the power to do this, when you, okay, for sure, at least I +would, I would pick an earth like planet that has the conditions based on my understanding +of the chemistry prerequisites for life and I would see it with life and run it. +Right? +Like, wouldn't you 100% do that and observe it and then protect? +I mean that that's not just a hell of a good TV show. +It's a good scientific experiment. +And that it is it's physical simulation, right? +Evolution is the most like actually running it, uh, is the most efficient way to, uh, +understand computation or to compute stuff or to understand life or, you know, what life +looks like and what branches it can take. +It doesn't make me kind of feel weird that we're part of a science experiment, but maybe +it's everything's a science experiment. +So does that change anything for us for a science experiment? +Um, I don't know. +Two descendants of apes talking about being inside of a science experiment. +I'm suspicious of this idea of like a deliberate panspermia as you described it, sir. +I don't see a divine intervention in some way in the, in the historical record right +now. +I do feel like, um, the story in these, in these books, like Nick Lane's books and so +on sort of makes sense. +Uh, and it makes sense how life arose on earth uniquely. +And uh, yeah, I don't need a, I mean, I don't need to reach for more exotic explanations +right now. +Sure. +But I think that inside of video game, don't, don't, don't observe any divine intervention +either. +And we might just be all NPCs running a kind of code. +Maybe eventually they will. +Currently NPCs are really dumb, but once they're running GPTs, um, maybe they will be like, +Hey, this is really suspicious. +What the hell? +So you are famously tweeted. +It looks like if you bombard earth with photons for a while, you can emit a roadster. +So if like in hitchhiker's guide to the galaxy, we would summarize the story of earth. +So in that book, it's mostly harmless. +Uh, what do you think is all the possible stories, like a paragraph long or sentence +long that earth could be summarized as once it's done, it's computation. +So like all the possible full, if earth is a book, right? +Uh, probably there has to be an ending. +I mean, there's going to be an end to earth and it could end in all kinds of ways. +It can end soon. +It can end later. +What do you think are the possible stories? +Well, definitely there seems to be, yeah, you're sort of, it's pretty incredible that +these self replicating systems will basically arise from the dynamics and then they perpetuate +themselves and become more complex and eventually become conscious and build a society. +And I kind of feel like in some sense, it's kind of like a deterministic wave, uh, that, +you know, that kind of just like happens on any, you know, any sufficiently well-arranged +system like earth. +And so I kind of feel like there's a certain sense of inevitability in it. +Um, and it's really beautiful. +And it ends somehow, right? +So it's a, it's a chemically a diverse environment where complex dynamical systems can evolve +and become more, more further and further complex. +But then there's a certain, um, what is it? +There's certain terminating conditions. +Yeah, I don't know what the terminating conditions are, but definitely there's a trend line of +something and we're part of that story. +And like, where does that, where does it go? +So you know, we're famously described often as a biological bootloader for AIs and that's +because humans, I mean, you know, we're an incredible, uh, biological system and we're +and, uh, you know, and love and so on. +Um, but we're extremely inefficient as well. +Like we're talking to each other through audio. +It's just kind of embarrassing, honestly, that we're manipulating like seven symbols, +uh, serially, we're using vocal cords. +It's all happening over like multiple seconds. +It's just like kind of embarrassing when you step down to the frequencies at which computers +operate or are able to cooperate on. +So basically it does seem like, um, synthetic intelligences are kind of like the next stage +of development. +And um, I don't know where it leads to. +Like at some point I suspect, uh, the universe is some kind of a puzzle and these, uh, synthetic +AIs will uncover that puzzle and, um, solve it. +And then what happens after, right? +Like what, cause if you just like fast forward earth, many billions of years, it's like, +it's quiet and then it's like to turmoil. +You see like city lights and stuff like that. +And then what happens at like, at the end, like, is it like a poof? +It's it, or is it like a calming, is it explosion? +Is it like earth like open, like a giant, cause you said, um, it roasters like, well, +let's start emitting like, like a giant number of like satellites. +Yes. +It's some kind of a crazy explosion and we're living, we're like, we're stepping +through a explosion and we're like living day to day and it doesn't look like it, but +it's actually, if you, I saw a very cool animation of earth, uh, and life on earth and basically +nothing happens for a long time. +And then the last like two seconds, like basically cities and everything and just in the lower +earth orbit just gets cluttered and just the whole thing happens in the last two seconds. +And you're like, this is exploding. +This is a state explosion. +So if you play, yeah, yeah. +If you play it at normal speed, it'll just look like an explosion. +It's a firecracker. +We're living in a firecracker. +Where it's going to start emitting all kinds of interesting things. +Yeah. +And then the, so explosion doesn't, it might actually look like a little explosion with, +with lights and fire and energy emitted, all that kind of stuff. +But when you look inside the details of the explosion, there's actual complexity +happening where there's like, uh, yeah, human life or some kind of life. +We hope it's not a destructive firecracker. +It's kind of like a constructive firecracker. +All right. +So given that, I think, uh, hilarious discussion. +It is really interesting to think about like what the puzzle of the universe is. +Did the creator of the universe, uh, give us a message? +Like for example, in the book, contact, um, Carl Sagan, uh, there's a message for +humanity, for any civilization in, uh, digits in the expansion of PI in base 11, +eventually, which is kind of interesting thought, uh, maybe, maybe we're supposed +to be giving a message to our creator. +Maybe we're supposed to somehow create some kind of a quantum mechanical system +that alerts them to our intelligent presence here. +Cause if you think about it from their perspective, it's just say like quantum +field theory, massive, like cellular, ton of a ton like thing. +And like, how do you even notice that we exist? +You might not even be able to pick us up in that simulation. +And so how do you, uh, how do you prove that you exist, uh, that you're +intelligent and that you're part of the universe? +So this is like a touring test for intelligence from earth. +Yeah. +I got the creator's, uh, I mean, maybe this is like trying to complete +the next word in a sentence. +This is a complicated way of that. +Like earth is just, is basically sending a message back. +Yeah. +The puzzle is basically like alerting the creator that we exist. +Uh, or maybe the puzzle is just to just break out of the system and just, uh, +you know, uh, stick it to the creator in some way. +Uh, basically, like if you're playing a video game, you can, um, you can somehow +find an exploit and find a way to execute on the host machine, uh, in the arbitrary +code, uh, there's some, uh, for example, I believe someone got a Mario, a game of +Mario to play pong just by, um, exploiting it and then, um, creating, uh, +basically writing, writing code and being able to execute arbitrary code in the +game. +And so maybe we should be, maybe that's the puzzle is that we should be, um, uh, +find a way to exploit it. +So, so I think like some of these synthetic guys will eventually find the +universe to be some kind of a puzzle and then solve it in some way. +And that's kind of like the end game somehow. +Do you often think about it as a, as a simulation? +So, uh, as, or the universe being a kind of computation that has, might have bugs +and exploits. +Yes. +Yeah, I think so. +I said, well, physics is essentially, I think it's possible that physics has +exploits and we should be trying to find them, uh, arranging some kind of a crazy +quantum mechanical system that somehow gives you buffer overflow, uh, somehow +gives you a rounding error in the floating point. +Uh, uh, yeah, that's right. +And we're like more and more sophisticated exploits. +Those are jokes, but that could be actually very close. +Yeah. +We'll find some way to extract infinite energy. +Uh, for example, when you train a reinforcement learning agents, um, in +physical simulations and you ask them to say, run quickly on the flat ground, +they'll end up doing all kinds of like weird things, um, in part of that +optimization, right? +They'll get on their back leg and they'll slide across the floor. +And it's because the optimization, um, the enforcement learning optimization on +that agent has figured out a way to extract infinite energy from the friction +forces and, um, basically their poor implementation. +And, uh, they found a way to generate infinite energy and just slide across the +surface and it's not what you expected. +It's just, uh, it's sort of like a perverse solution. +And so maybe we can find something like that. +Maybe we can be that little dog in this physical simulation. +The cracks or escapes the intended consequences of the physics that the +universe came up with will figure out some kind of shortcut to some weirdness. +Yeah. +And then, man, but see the problem with that weirdness is the first person to +discover the weirdness, like sliding in the back legs. +That's all we're going to do. +Yeah. +It's very quickly become everybody does that thing. +So like the paperclip maximizer is a ridiculous idea, but that very well could +be what then we'll just, uh, we'll just all switched that cause it's so fun. +Well, no person will discover it. +I think, by the way, I think it's going to have to be, uh, some kind of a super +intelligent AGI of a third generation. +Like we're building the first generation AGI. +And then, you know, third generation. +Yeah. +So the, the bootloader for an AI, the, that AI will be a +bootloader for another AI. +Yeah. +And then there's no way for us to introspect like what that might even, uh, +I think it's very likely that these things, for example, like, say you have +these AGI's it's very likely that, for example, they will be completely inert. +I like these kinds of sci-fi books sometimes where these things are just +completely inert, they don't interact with anything. +And I find that kind of beautiful because, uh, they probably, uh, they've +probably figured out the meta meta game of the universe in some way, potentially +there, they're doing something completely beyond our imagination. +Um, and, uh, they don't interact with simple chemical life forms. +Like, why would you do that? +So I find those kinds of ideas compelling. +What's their source of fun? +What are they, what are they doing? +What's the source of pleasure solving in the universe, but in there. +So can you define what it means inert? +So they escape the interaction. +As in, um, uh, they will behave in some very strange way to us, uh, because +they're, uh, they're beyond, they're playing the meta game, uh, and the meta +game is probably say like arranging quantum mechanical systems and some very +weird ways to extract infinite energy, uh, solve the digital expansion of +pie to whatever amount, uh, they will build their own like little fusion +reactors or something crazy, like they're doing something beyond comprehension +and not understandable to us and actually brilliant under the hood. +What if quantum mechanics itself is the system and we're just thinking it's +physics, but we're really parasites on, on, not parasite, we're not really +hurting physics, we're just living on this organisms, this organism, and +we're like trying to understand it, but really it is an organism and with +a deep, deep intelligence, maybe physics itself is, uh, the, the, the organism +that's doing the super interesting thing. +And we're just like one little thing, yeah. +And sitting on top of it, trying to get energy from it. +We're just kind of like these particles in the wave that I feel like is mostly +deterministic and takes a universe from some kind of a big bang to some kind +of a super intelligent replicator, some kind of a stable point in the universe. +Given these laws of physics, you don't think, uh, as Einstein said, God +doesn't play dice, so you think it's mostly deterministic. +There's no randomness in the thing. +I think it's deterministic. +Oh, there's tons of, uh, well, I'm, I'm, I want to be careful with randomness. +Pseudo random. +Yeah. +I don't like random. +Uh, I think maybe the laws of physics are deterministic. +Um, yeah, I think they're deterministic. +You just got really uncomfortable with this question. +I just, do you have anxiety about whether the universe is random or not? +Is this a, what's, there's no randomness. +It's, uh, you said you like goodwill hunting. +It's not your fault, Andre. +It's not your fault, man. +Um, so you don't like randomness. +Uh, yeah, I think it's, uh, unsettling. +I think it's a deterministic system. +I think that things that look random, like say the, uh, collapse of the wave +function, et cetera, I think they're actually deterministic, just entanglement, +uh, and so on and, uh, some kind of a multiverse theory, something, something. +Okay. +So why does it feel like we have a free will? +Like if I, if I raised his hand, I chose to do this now. +Um, what that doesn't feel like a deterministic thing. +It feels like I'm making a choice. +It feels like it. +Okay. +So it's all feelings. +It's just feelings. +Yeah. +So when RL agent is making a choice, is that, um, it's not really +making a choice. +The choice was all already there. +Yeah. +You're interpreting the choice and you're creating a narrative for, for having made it. +Yeah. +And now we're talking about the narrative. +It's very meta looking back. +What is the most beautiful or surprising idea in deep learning or AI in general +that you've come across? +You've seen this field explode, uh, and grow in interesting ways. +Just what, what cool ideas like, like we made you sit back and go, +small, big or small. +Well, the one that I've been thinking about recently, the most probably is the, +the transformer architecture. +Um, so basically, uh, neural networks have, uh, a lot of architectures that were +trendy have come and gone for different sensory modalities, like for vision, +audio, text, you would process them with different looking neural nets. +And recently we've seen these, this convergence towards one architecture, +the transformer, and, uh, you can feed it video or you can feed it, you know, +images or speech or text, and it just gobbles it up and it's kind of like +a bit of a general purpose, uh, computer. +There's also trainable and very efficient to run on our hardware. +And so, uh, this paper came out in 2016. +I want to say, um, attention is all you need. +Attention is all you need. +You criticize the paper title in retrospect that it wasn't, um, it didn't +foresee the bigness of the impact that it was going to have. +Yeah. +I'm not sure if the authors were aware of the impact that that paper would go +on to have, probably they weren't, but I think they were aware of some of the +motivations and design decisions behind the transformer and they chose not to, +I think, uh, expand on it in that way in the paper. +And so I think they had an idea that there was more, um, than just the +surface of just like, Oh, we're just doing translation and here's a better +architecture. +You're not just doing translation. +This is like a really cool, differentiable, optimizable, efficient +computer that you've proposed. +And maybe they didn't have all of that foresight, but I think it's really +interesting. +Isn't it funny, sorry to interrupt that that title is memeable that they went +for such a profound idea. +They went with a, I don't think anyone used that kind of title before, right? +Attention is all you need. +Yeah. +It's like a meme or something. +Yeah. +It's not funny that one, like, uh, maybe if it was a more serious title, it +wouldn't have the impact. +Honestly, I, yeah, there is an element of me that honestly agrees with you and +prefers it this way. +Yes. +Uh, if it was too grand, it would over promise and then under deliver +potentially. +So you want to just, uh, meme your way to greatness. +That should be a t-shirt. +So you, you tweeted the transformer is a magnificent neural network architecture +because it is a general purpose, differentiable computer. +It is simultaneously expressive in the forward pass, optimizable via back +propagation, gradient descent, and efficient high parallelism compute graph. +Can you discuss some of those details, expressive, optimizable, efficient +for memory or, or in general, whatever comes to your heart? +You want to have a general purpose computer that you can train on arbitrary +problems, uh, like say the task of next work prediction or detecting if there's +a cat in a image or something like that. +And you want to train this computer. +So you want to set its, its weights. +And I think there's a number of design criteria that sort of overlap in the +transformer simultaneously that made it very successful. +And I think the authors were kind of, uh, deliberately trying to, uh, make +this really, uh, powerful architecture. +And, um, so basically it's very powerful in the forward pass because it's able +to express, um, very general computation as sort of something that looks like +message passing, uh, you have nodes and they all store vectors and, uh, these +nodes get to basically look at each other and it's, uh, each other's vectors +and they get to communicate and basically nodes get to broadcast, Hey, +I'm looking for certain things. +And then other nodes get to broadcast. +Hey, these are the things I have. +Those are the keys and the values. +So it's not just the tension. +Yeah, exactly. +Transformers much more than just the attention component. +It's got many pieces architectural that went into it. +The residual connection of the weights arranged, there's a multi-layer perceptron +and they're the weights stacked and so on. +Um, but basically there's a message passing scheme where nodes get to look at +each other, decide what's interesting and then update each other. +And, uh, so I think the, um, when you get to the details of it, I think +it's a very expressive function. +Uh, so it can express lots of different types of algorithms and forward pass. +Not only that, but the way it's designed with the residual connections, +layer normalizations, the soft max attention and everything. +It's also optimizable. +This is a really big deal because there's lots of computers that are +powerful that you can't optimize. +Um, or they're not easy to optimize using the techniques that we have, +which is backpropagation and gradient and sent. +These are first order methods, very simple optimizers really. +And so, um, you also need it to be optimizable. +Um, and then lastly, you want it to run efficiently in our hardware. +Our hardware is a massive throughput machine, like GPUs. +Uh, they prefer lots of parallelism. +So you don't want to do lots of sequential operations. +So you want to do a lot of operations serially and the transformer is designed +with that in mind as well. +And so it's designed for our hardware and it's designed to both be very +expressive in a forward pass, but also very optimizable in the backward pass. +And you said that, uh, the residual connections support a kind of ability +to learn short algorithms fast and first, and then gradually extend them, +uh, longer during training. +What's, what's the idea of learning short algorithms? +Right. +Think of it as a, so basically a transformer is a, uh, series of, uh, +blocks, right? +And these blocks have attention and a little multilayer perceptual. +And so you, you go off into a block and you come back to this residual pathway. +And then you go off and you come back and then you have a number +of layers arranged sequentially. +And so the way to look at it, I think is, uh, because of the residual +pathway in the backward pass, the gradients, uh, sort of flow along it uninterrupted +because addition, uh, distributes the gradient equally to all of its branches. +So the gradient from the supervision at the top, uh, just floats +directly to the first layer. +And the, all the residual connections are arranged so that in the beginning +at during initialization, they contribute nothing to the residual pathway. +Um, so what it kind of looks like is imagine the transformer is kind of +like a, uh, Python, uh, function, like a death. +And, um, you get to do various kinds of like lines of code. +Uh, say you have a hundred layers, deep, uh, transformer, typically +they would be much shorter, say 20. +So if 20 lines of code, then you can do something in them. +And so think of during the optimization, basically what it looks like is first +you optimize the first line of code and then the second line of code can kick +in and the third line of code can kick in. +And I kind of feel like because of the residual pathway and the dynamics of +the optimization, uh, you can sort of learn a very short algorithm that +gets the approximate answer, but then the other layers can sort of kick in and +start to create a contribution. +And at the end of it, you're, you're optimizing over an algorithm +that is a 20 lines of code. +Except these lines of code are very complex because it's an +entire block of a transformer. +You can do a lot in there. +Well, it's really interesting is that this transformer architecture +actually has been a remarkably resilient. +Basically the transformer that came out in 2016 is the transformer +you would use today, except you reshuffle some of the layer norms. +Uh, the layer normalizations have been reshuffled to a pre-norm, um, formulation. +And so it's been remarkably stable, but there's a lot of bells and whistles +that people have attached on it and try to, uh, improve it. +I do think that basically it's a, it's a big step in simultaneously optimizing +for lots of properties of a desirable neural network architecture. +And I think people have been trying to change it, but it's proven +remarkably resilient. +Um, but I do think that there should be even better architectures potentially. +But it's, uh, you're, you admire the resilience here. +Yeah. +There's something profound about this architecture that, that least +resilient, so maybe we can, everything can be turned into a, uh, into a problem +that transformers can solve. +Currently definitely looks like the transformer is taking over AI and you +can feed basically arbitrary problems into it. +And it's a general, the French double computer and it's extremely powerful. +And, uh, at this conversions in AI has been, uh, really interesting +to watch, uh, for me personally. +What else do you think could be discovered here about transformers? +Like what's surprising thing or, or is it a stable, um, I want a stable place. +Is there something interesting we might discover about transformers? +Like aha moments maybe has to do with memory. +Um, maybe knowledge representation, that kind of stuff. +Definitely does that guys today is just pushing like basically right now, the +side guys is do not touch the transformer, touch everything else. +Yes. +So people are scaling up the data sets, making them much, much bigger. +They're working on the evaluation, making the evaluation much, much bigger. +And, uh, um, they're basically keeping the architecture unchanged. +And that's how we've, um, that's the last five years of progress in AI kind of. +What do you think about one flavor of it, which is language models? +Have you been surprised? +Uh, has your sort of imagination been captivated by you mentioned +GPT and all the bigger and bigger and bigger language models. +And, uh, what are the limits of those models do you think? +So just for the task of natural language. +Basically the way GPT is trained, right. +Is you just download a massive amount of text data from the internet and that you +try to predict the next word in a sequence, roughly speaking, you're +predicting little work chunks, but, uh, roughly speaking, that's it. +Um, and what's been really interesting to watch is, uh, basically it's a language +model, language models have actually existed for a very long time. +Um, there's papers on language modeling from 2003, even earlier. +Can you explain that case? +What a language model is? +Uh, yeah. +So language model just, uh, basically the rough idea is, um, just predicting +the next word in a sequence, roughly speaking. +Uh, so there's a paper from, for example, Ben Geo, uh, and the team from 2003, +where for the first time they were using a neural network to take, say like three +or five words and predict the, um, next word, and they're doing this on much +smaller datasets and the neural net is not a transformer, it's a multi-layer +perceptron, but it's the first time that a neural network has been applied in +that setting, but even before neural networks, there were, um, language models, +except they were using, um, Ngram models. +So Ngram models are just, uh, count based models. +So, um, if you try to, if you start to take two words and predict the third +one, you just count up how many times you've seen any, uh, two word combinations +and what came next and what you predict as coming next is just what you've seen +the most of in the training set. +And so, uh, language modeling has been around for a long time. +Neural networks have done language modeling for a long time. +So really what's new or interesting or exciting is just realizing that when you +scale it up, uh, with a powerful enough neural net, a transformer, you have all +these emergent properties where, uh, basically what happens is if you have a +large enough dataset of text, you are in the task of predicting the next word. +You are multitasking a huge amount of different kinds of problems. +You are multitasking, understanding of, you know, chemistry, physics, human +nature, lots of things are sort of clustered in that objective. +It's a very simple objective, but actually you have to understand +a lot about the world to make that prediction. +You just said the U word understanding, uh, are you in terms of chemistry and +physics and so on, what do you feel like it's doing? +Is it searching for the right context? +Uh, in, in like, what is it, what is the actual process happening here? +Yeah. +So basically it gets a thousand words and it's trying to predict the thousand and +first, and, uh, in order to do that very, very well over the entire dataset +available on the internet, you actually have to basically kind of understand +the context of, of what's going on in there. +Yeah. +Um, and, uh, it's a sufficiently hard problem that you, uh, if you have a +powerful enough computer, like a transformer, you end up with a interesting +solutions and, uh, you can ask it to do all kinds of things and, um, it, it +shows a lot of, uh, emergent properties, like in context learning. +That was the big deal with GPT and the original paper when they published it +is that you can just sort of, uh, prompt it in various ways and ask it to do +various things and it will just kind of complete the sentence, but in the process +of just completing the sentence, it's actually solving all kinds of really, +uh, interesting problems that we care about. +Do you think it's doing something like understanding? +Like when we use the word understanding for us humans, I think it's doing some +understanding in its weights, it understands, I think a lot about the world +and it has to, in order to predict the next word in the sequence. +So it's trained on the data from the internet. +Uh, what do you think about this, this approach in terms of data sets +of using data from the internet? +Do you think the internet has enough structured data to teach +AI about human civilization? +Yes. +So I think the internet has a huge amount of data. +I'm not sure if it's a complete enough set. +I don't know that, uh, text is enough for having a sufficiently +powerful AGI as an outcome. +Um, of course there is audio and video and images and all that. +Yeah. +Kind of stuff. +Yeah. +So text by itself, I'm a little bit suspicious about. +There's a ton of things we don't put in text in writing, uh, just +because they're obvious to us about how the world works and the physics of it. +And that things fall, we don't put that stuff in text because why would you, +we share that understanding. +And so text is a communication medium between humans and it's not a, uh, all +encompassing medium of knowledge about the world, but as you pointed out, +we do have video and we have images and we have audio. +And so I think that, uh, that definitely helps a lot, but we haven't +trained models, uh, sufficiently, uh, across both across all of those modalities yet. +Uh, so I think that's what a lot of people are interested in. +But I wonder what that shared understanding of like what we might call common +sense has to be learned, inferred in order to complete the sentence correctly. +So maybe the fact that it's implied on the internet, the model is going +to have to learn that not by reading about it, by inferring it in the representation. +So like common sense, just like we, I don't think we learn common sense. +Like nobody says, tells us explicitly. +We just figure it all out by interacting with the world. +And so here's a model of reading about the way people interact with the world. +It might have to infer that. +I wonder, uh, you, you briefly worked on a project called the world of bits, +training and RRL system to take actions on the internet, um, versus just consuming +the internet, like we talked about. +Do you think there's a future for that kind of system interacting with +the internet to help the learning? +Yes. +I think that's probably the, uh, the final frontier for a lot of these +models, uh, because, um, so as you mentioned, when I was at open AI, I was +working on this project for a little bit. +And basically it was the idea of giving neural networks access to a keyboard +and a mouse and the idea possibly go wrong. +So basically you, um, you perceive the input of the, uh, screen pixels. +And, uh, basically the state of the computer is sort of visualized, uh, for +human consumption in images of the web browser and stuff like that. +And then you give them your own or the ability to press keyboards and use the +mouse and we're trying to get it to, for example, complete bookings and, you +know, interact with user interfaces. +And, um, +what'd you learn from that experience? +Like, what was some fun stuff? +This is a super cool idea. +Yeah. +I mean, it's like, uh, yeah, I mean, the, the step between observer to actor +is a super fascinating step. +Yeah. +Well, it's the universal interface in the digital realm, I would say. +And, uh, there's a universal interface in like the physical realm, which in my +mind is a humanoid form factor kind of thing. +Uh, we can later talk about optimists and so on, but I feel like there's a, uh, +they're kind of like a similar philosophy in some way where the human, the world, +the physical world is designed for the human form and the digital world is +designed for the human form of seeing the screen and using keyword, not +keyboard and mouse. +And so it's the universal interface that can basically, uh, command the digital +infrastructure we've built up for ourselves. +And so it feels like a very powerful interface to, to command and to build on +top of, uh, now to your question as to like what I learned from that, it's +interesting because the world of bits was basically too early, I think at +open AI at the time, um, this is around 2015 or so, and the zeitgeist at that +time was very different in AI from the zeitgeist today at the time, everyone +was super excited about reinforcement learning from scratch. +Uh, this is the time of the Atari paper, uh, where, uh, neural networks were +playing Atari games, um, and beating humans in some cases, uh, AlphaGo and so on. +So everyone's very excited about training neural networks from scratch +using reinforcement learning, um, directly. +It turns out that reinforcement learning is extremely inefficient way of training +neural networks because you're taking all these actions and all these +observations and you get some sparse rewards once in a while. +So you do all this stuff based on all these inputs and once in a while, +you're like told you did a good thing, you did a bad thing. +And it's just an extremely hard problem. +You can't learn from that. +Uh, you can burn a forest and you can sort of brute force through it. +And we saw that I think with, uh, you know, with, uh, go and +Dota and so on and does work. +Uh, but it's extremely inefficient, uh, and, uh, not how you want to +approach problems, uh, practically speaking. +And so that's the approach that at the time we also took to world of bits. +Uh, we would, uh, have an agent initialize randomly. +So with keyboard mash and mouse mash and try to make a booking. +And it's just like revealed the insanity of that approach very quickly, +where you have to stumble by the correct booking in order to get a reward of +you did it correctly and you're never going to stumble by it by chance at random. +So even with a simple web interface, there's too many options. +There's just too many options. +Uh, and, uh, it's too sparse of a reward signal and you're +starting from scratch at the time. +And so you don't know how to read. +You don't understand pictures, images, buttons. +You don't understand what it means to like make a booking, but now what's +happened is, uh, it is time to revisit that and open AI is interested in this. +Uh, companies like adept are interested in this and so on. +And, uh, the idea is coming back, uh, because the interface is very powerful, +but now you're not training an agent from scratch. +You are taking the GPT as an initialization. +So GPT is pre-trained on all of. +Text and it understands what's a booking. +It understands what's a submit. +It understands, um, quite a bit more. +And so it already has those representations. +They are very powerful. +And that makes all the training significantly more efficient, um, +and makes the problem tractable. +Should the interaction be with like the way humans see it with the buttons and +the language, or should be with the HTML, JavaScript and the CSS? +What's, what do you think is the better? +Uh, so today all of this interaction is mostly on the level of HTML, CSS, +and so on that's done because of computational constraints. +Uh, but I think ultimately, um, uh, everything is designed for human +visual consumption and so at the end of the day, there's all the additional +information is in the layout of the webpage and what's next to you and +what's a red background and all this kind of stuff and what it looks like visually. +So I think that's the final frontier as we are taking in a pixels and we're +giving out keyboard mouse commands. +Uh, but I think it's impractical still today. +Do you worry about bots on the internet? +Given, given these ideas, given how exciting they are, do you worry about +bots on Twitter being not the stupid boss that we see now with the crypto +bots, but the bots that might be out there actually that we don't see that +they're interacting in interesting ways. +So this kind of system feels like it should be able to pass the, I'm not a +robot click button, whatever. +Um, which does she understand how that test works? +I don't quite like, uh, there's, there's a, there's a checkbox or +whatever that you click is presumably tracking like mouse movement and +the timing and so on. +So exactly this kind of system we're talking about should be able to pass that. +So w yeah, what do you feel about, um, bots that are language models plus have +some interact ability and are able to tweet and reply and so on, do you worry +about that world? +Uh, yeah, I think it's always been a bit of an arms race, uh, between sort +of the attack and the defense. +Uh, so the attack will get stronger, but the defense will get stronger as well. +Uh, our ability to detect that. +How do you defend, how do you detect, how do you know that your Carpati +account on Twitter is, is human? +How would you approach that? +Like if people were claimed, you know, uh, how would you defend yourself in +the court of law that I'm a human? +Um, this account is, yeah, at some point, I think, uh, it might be, I think +the society, the society will evolve a little bit, like we might start signing +digitally, signing, uh, some of our correspondence or, you know, things that +we create, uh, right now it's not necessary, but maybe in the future it +might be, I do think that we are going towards a world where we share, we +share the digital space with, uh, AIs. +Synthetic beings. +Yeah. +And, uh, they will get much better and they will share our digital realm and +they'll eventually share our physical realm as well. +It's much harder. +Uh, but that's kind of like the world we're going towards and most of them +will be benign and awful and some of them will be malicious and it's going to be +an arms race trying to detect them. +So, I mean, the worst isn't the AIs. +The worst is the AIs pretending to be human. +So mine, I don't know if it's always malicious. +There's obviously a lot of malicious applications, but it could also be, you +know, if I was an AI, I would try very hard to pretend to be human because we're +in a human world. +I wouldn't get any respect as an AI. +I want to get some love and respect. +I don't think the problem is intractable. +People are, people are thinking about the proof of personhood and, uh, we +might start digitally signing our stuff and we might all end up having like, uh, +yeah, basically some, some solution for proof of personhood. +It doesn't seem to me intractable. +It's just something that we haven't had to do until now, but I think once the +need like really starts to emerge, which is soon, I think people will think +about it much more. +So, but that too will be a race because, um, obviously you can probably, uh, +spoof or fake the, the, the proof of, uh, personhood. +So you have to try to figure out how to, I mean, it's weird that we have like +social security numbers and like passports and stuff. +It seems like it's harder to fake stuff in the physical space. +In the digital space, it just feels like it's going to be very tricky, very +tricky to out, um, cause it seems to be pretty low cost to fake stuff. +What are you going to put an AI in jail for like trying to use a fake, uh, +fake personhood proof? +You can, I mean, okay, fine. +You'll put a lot of AIs in jail, but there'll be more as arbitrary, like +exponentially more the cost of creating a bot is very low. +Unless there's some kind of way to track accurately, like you're not allowed to +create any program without showing, uh, tying yourself to that program. +Like you, any program that runs on the internet, you'll be able to, uh, trace +every single human program and those involved with that program. +Yeah, maybe you have to start declaring when, uh, you know, we have to start +drawing those boundaries and keeping track of, okay, uh, what are digital +entities versus human entities and, uh, what is the ownership of human entities +and digital entities and, uh, something like that, um, I don't know, but I'm, +I think I'm optimistic that this is, uh, this is, uh, possible and it's some, in +some sense, we're currently in like the worst time of it because, um, all these +bots suddenly have become very capable, uh, but we don't have the fences yet +built up as a society and, but I think, uh, that doesn't seem to me intractable. +It's just something that we have to deal with. +It seems weird that the Twitter bot, like really crappy Twitter bots are so +numerous, like is it, so I presume that the engineers at Twitter are very good. +So it seems like what I would infer from that, uh, is it seems like a hard problem. +It, they're probably catching, right. +If I were to sort of steal man, the case, it's a hard problem and there's a +huge cost to, uh, false positive to, to removing a post by somebody that's not a +bot that creates a very bad user experience. +So they're very cautious about removing. +So maybe it's, um, and maybe the bots are really good at learning what gets +removed and not such that they can stay ahead of the removal process very quickly. +My impression of it honestly is, uh, there's a lot of loaning fruit. +I mean, yeah, just that's what I, it's not subtle. +My impression of it. +It's not subtle, but you have to, yeah, that's my impression as well, but it +feels like maybe you're seeing the, the tip of the iceberg, maybe the number of +bots isn't like the trillions and you have to like, yeah, just, it's a +constant assault of bots and you, yeah, I don't know, um, you have to steal man +in the case, cause the bots I'm seeing are pretty like obvious. +I could write a few lines of code that catch these spots. +I mean, definitely there's a lot of loaning fruit, but I will say, I agree +that if you are a sophisticated actor, you could probably create a pretty good +bot right now, um, you know, using tools like GPTs, uh, because it's a language +model, you can generate faces that look quite good now, uh, and you can do this +at scale. +And so I think, um, yeah, it's quite plausible and it's going to be hard to defend. +There was a Google engineer that claimed that, uh, Lambda was sentient. +Do you think there's any inkling of truth to what he felt? +And more importantly, to me, at least, do you think language models will achieve +sentience or the illusion of sentience soonish? +Yeah, to me, it's a little bit of a canary in a coal mine kind of moment, +honestly, a little bit, uh, because, uh, so this engineer spoke to like a chat +bot at Google and, uh, became convinced that, uh, this bot is sentient. +He asked us some existential philosophical questions and gave like +reasonable answers and looked real and, uh, and so on. +Uh, so to me, it's a, uh, he was, he was, uh, he wasn't sufficiently trying to +stress the system, I think, and, uh, exposing the truth of it as it is today. +Um, but, uh, I think this will be increasingly harder over time. +Uh, so, uh, yeah, I think more and more people will basically, uh, become, um, +yeah, I think more and more, there'll be more people like that over time. +As, as this gets better, like form an emotional connection to an AI. +Plausible in my mind. +I think these AIs are actually quite good at human, human connection, human +emotion, a ton of text on the internet is about humans and connection and love +and so on, so I think they have a very good understanding in some, in some sense +of, of how people speak to each other about this and, um, they're very capable +of creating a lot of that kind of text. +The, um, there's a lot of like sci-fi from fifties and sixties that imagined +AIs in a very different way. +They are calculating cold Vulcan like machines. +That's not what we're getting today. +We're getting pretty emotional AIs that actually, uh, are very competent and +capable of generating, you know, plausible sounding text with respect to all of +these topics. +See, I'm really hopeful about AI systems that are like companions that help you +grow, develop as a human being, uh, help you maximize long-term happiness. +But I'm also very worried about AI systems that figure out from the +internet, the humans get attracted to drama. +And so these would just be like shit talking AIs. +They just constantly, did you hear it? +Like they'll do gossip. +They'll do, uh, they'll try to plant seeds of suspicion to other humans that +you love and trust and, uh, just kind of mess with people, uh, in the, you know, +cause, cause that's going to get a lot of attention to drama, maximize drama on +the path to maximizing, uh, engagement and us humans will feed into that machine +and get, it'll be a giant drama shit storm. +Uh, so I'm worried about that. +So it's the objective function really defines the way that human civilization +progresses with AIs in it. +I think right now, at least today, they are not sort of, it's not correct to +really think of them as goal seeking agents that want to do something. +They have no long-term memory or anything. +They it's literally a good approximation of it is you get a thousand words and +you're trying to predict a thousand at first, and then you continue feeding it +in and you are free to prompt it in whatever way you want. +So in text, so you say, okay, you are a psychologist and you are very good +and you love humans and here's a conversation between you and another human. +Human colon, something you something, and then it just continues the pattern. +And suddenly you're having a conversation with a fake psychologist +who's like trying to help you. +And so it's still kind of like in a realm of a tool is a, um, people can prompt +it in arbitrary ways and it can create really incredible text, but it doesn't +have long-term goals over long periods of time. +It doesn't try to, uh, so it doesn't look that way right now. +Yeah, but you can do short-term goals that have long-term effects. +So if my prompting short-term goal is to get Andre Capati to respond to me on +Twitter, whenever, like I think AI might that's the goal, but it might figure out +that talking shit to you, it would be the best in a highly sophisticated, interesting +way. +And then you build up a relationship when you were spelling once and then it +like over time it gets to not be sophisticated and just like just +talk shit. +And okay, maybe you won't get to Andre, but it might get to another +celebrity, it might get into other big accounts and then it'll just, so with +just that simple goal, get them to respond, maximize the probability of +actual response. +Yeah. +I mean, you could prompt a powerful model like this with their, it's opinion +about how to do any possible thing you're interested in. +So they will just, they're kind of on track to become these oracles. +I could sort of think of it that way. +They are oracles. +Currently it's just text, but they will have calculators. +They will have access to Google search. +They will have all kinds of gadgets and gizmos. +They will be able to operate the internet and find different information. +And yeah, in some sense, that's kind of like currently what it looks like in +terms of the development. +Do you think it'll be an improvement eventually over what Google is for access +to human knowledge? +Like it'll be a more effective search engine to access human knowledge. +I think there's definite scope in building a better search engine today. +And I think Google, they have all the tools, all the people, they have +everything they need, they have all the puzzle pieces, they have people training +transformers at scale, they have all the data. +It's just not obvious if they are capable as an organization to innovate on their +search engine right now. +And if they don't, someone else will. +There's absolute scope for building a significantly better search engine +built on these tools. +It's so interesting. +A large company where the search, there's already an infrastructure. +It works as brings out a lot of money. +So where structurally inside a company is their motivation to pivot? +To say, we're going to build a new search engine. +Yeah, that's hard. +So it's usually going to come from a startup, right? +That's that would be, yeah. +Or some other more competent organization. +So I don't know. +So currently, for example, maybe Bing has another shot at it. +You know, as an example. +Microsoft Edge, we're talking offline. +I mean, it definitely is really interesting because search engines used to be about, +OK, here's some query. +Here's here's here's web pages that look like the stuff that you have. +But you could just directly go to answer and then have supporting evidence. +And these these models, basically, they've read all the text and they've read all the +web pages. +And so sometimes when you see yourself going over to search results and sort of getting +like a sense of like the average answer to whatever you're interested in, like that just +directly comes out. +You don't have to do that work. +So they're kind of like. +Yeah, I think they have a way to this of distilling all that knowledge into. +Like some level of insight, basically. +Do you think of prompting as a kind of teaching and learning like this whole process, +like another layer? +You know, because maybe that's what humans are. +We already have that background model and you're the world is prompting you. +Yeah, exactly. +I think the way we are programming these models is that we're trying to make it +like computers now like GPT's is converging to how you program humans. +I mean, how do I program humans via prompt? +I go to people and I prompt them to do things. +I prompt them from information. +And so natural language prompt is how we program humans. +And we're starting to program computers directly in that interface. +It's like pretty remarkable, honestly. +So you've spoken a lot about the idea of software 2.0. +All good ideas become like cliches so quickly, like the terms. +It's kind of hilarious. +It's like I think Eminem once said that like if he gets annoyed by a song he's written +very quickly, that means it's going to be a big hit because it's too catchy. +But can you describe this idea and how you're thinking about it has evolved over the +months and years since since you coined it? +Yeah. +Yes, I had a blog post on software 2.0, I think several years ago now. +And the reason I wrote that post is because I kept I kind of saw something remarkable +happening in like software development and how a lot of code was being transitioned to +be written not in sort of like C++ and so on, but it's written in the weights of a +neural net, basically just saying that neural nets are taking over software, the realm of +software and taking more and more tasks. +And at the time, I think not many people understood this deeply enough that this is a big +deal. It's a big transition. +Neural networks were seen as one of multiple classification algorithms you might use for +your data set problem on Kaggle. +Like this is not that this is a change in how we program computers. +And I saw neural nets as this is going to take over. +The way we program computers is going to change. +It's not going to be people writing software in C++ or something like that and directly +programming the software. It's going to be accumulating training sets and data sets and +crafting these objectives by which you train these neural nets. +And at some point, there's going to be a compilation process from the data sets and the +objective and the architecture specification into the binary, which is really just the +neural net weights and the forward pass of the neural net. +And then you can deploy that binary. +And so I was talking about that sort of transition and that's what the post is about. +And I saw this sort of play out in a lot of fields, autopilot being one of them, but +also just simple image classification. +People thought originally, you know, in the 80s and so on that they would write the +algorithm for detecting a dog in an image. +And they had all these ideas about how the brain does it. +And first we detect corners and then we detect lines and then we stitch them up. +And they were like really going at it. +They were like thinking about how they're going to write the algorithm. +And this is not the way you build it. +And there was a smooth transition where, OK, first we thought we were going to build +everything. Then we were building the features. +So like hog features and things like that that detect these little statistical patterns +from image patches. And then there was a little bit of learning on top of it, like a +support vector machine or binary classifier for cat versus dog and images on top of the +features. So we wrote the features, but we trained the last layer, sort of the +classifier. And then people are like, actually, let's not even design the features +because we can't. Honestly, we're not very good at it. +So let's also learn the features. +And then you end up with basically a convolutional neural net where you're learning +most of it. You're just specifying the architecture and the architecture has tons of +fill in the blanks, which is all the knobs, and you let the optimization write most of +it. And so this transition is happening across the industry everywhere. +And suddenly we end up with a ton of code that is written in neural net weights. +And I was just pointing out that the analogy is actually pretty strong. +And we have a lot of developer environments for software 1.0, like we have IDEs, how +you work with code, how you debug code, how you run code, how do you maintain code? +We have GitHub. So I was trying to make those analogies in the new realm. +Like, what is the GitHub of software 2.0? +Turns out it's something that looks like hugging face right now. +You know, and so I think some people took it seriously and built cool companies. +And many people originally attacked the post. +It actually was not well received when I wrote it. +And I think maybe it has something to do with the title, but the post was not well +received. And I think more people sort of have been coming around to it over time. +Yeah. So you were the director of AI at Tesla where I think this idea was really +implemented at scale, which is how you have engineering teams doing software 2.0. +So can you sort of linger on that idea of, I think we're in the really early stages +of everything you just said, which is like GitHub IDEs. +Like how do we build engineering teams that that work in software 2.0 systems and +the data collection and the data annotation, which is all part of that +software 2.0. Like, what do you think is the task of programming in software 2.0? +Is it debugging in the space of hyperparameters or is it also debugging in +the space of data? +Yeah. The way by which you program the computer and influence its algorithm is +not by writing the commands yourself. +You're changing mostly the data set. +You're changing the loss functions of like what the neural net is trying to do, how +it's trying to predict things. But basically the data sets and the architecture of +the neural net. And so in the case of the autopilot, a lot of the data sets have to +do with, for example, detection of objects and lane line markings and traffic lights +and so on. So you accumulate massive data sets of here's an example, here's the +desired label, and then here's roughly how the architect, here's roughly what the +algorithm should look like. And that's a convolutional neural net. +So the specification of the architecture is like a hint as to what the algorithm +should roughly look like. And then the fill in the blanks process of optimization is +the training process. And then you take your neural net that was trained, it gives +all the right answers on your data set and you deploy it. +So there is in that case, perhaps at all machine learning cases, there's a lot of +tasks. So is coming up, formulating a task like for a multi-headed neural network is +formulating a task part of the programming? Yeah, very much so. How you break down a +problem into a set of tasks. Yeah. I'm on a high level, I would say, if you look at +the software running in the autopilot, I gave a number of talks on this topic. I +would say originally a lot of it was written in software 1.0. There's imagine lots of C++, +right? And then gradually there was a tiny neural net that was, for example, predicting, given a +single image, is there like a traffic light or not? Or is there a landline marking or not? +And this neural net didn't have too much to do in the scope of the software. It was making tiny +predictions on individual little image. And then the rest of the system stitched it up. So, okay, +we're actually, we don't have just a single camera, we have eight cameras. We actually have eight +cameras over time. And so what do you do with these predictions? How do you put them together? +How do you do the fusion of all that information? And how do you act on it? All of that was written +by humans in C++. And then we decided, okay, we don't actually want to do all of that fusion +in C++ code because we're actually not good enough to write that algorithm. We want the neural nets +to write the algorithm and we want to port all of that software into the 2.0 stack. And so then we +actually had neural nets that now take all the eight camera images simultaneously and make +predictions for all of that. And actually they don't make predictions in the space of images, +they now make predictions directly in 3D. And actually they don't in three dimensions around +the car. And now actually we don't manually fuse the predictions in 3D over time. We don't trust +ourselves to write that tracker. So actually we give the neural net the information over time. +So it takes these videos now and makes those predictions. And so you're sort of just like +putting more and more power into the neural net, more processing. And at the end of it, the +eventual goal is to have most of the software potentially be in the 2.0 land because it works +significantly better. Humans are just not very good at writing software basically. +So the prediction is happening in this 4D land with three dimensional world over time. How do you +do annotation in that world? So data annotation, whether it's self-supervised or manual by humans +is a big part of the software 2.0 world. Right. I would say by far in the industry, +if you're talking about the industry and what is the technology of what we have available, +everything is supervised learning. So you need data sets of input, desired output, +and you need lots of it. And there are three properties of it that you need. You need it to +be very large, you need it to be accurate, no mistakes, and you need it to be diverse. +You don't want to just have a lot of correct examples of one thing. You need to really cover +the space of possibility as much as you can. And the more you can cover the space of possible inputs, +the better the algorithm will work at the end. Now, once you have really good data sets that you're +collecting, curating, and cleaning, you can train your neural net on top of that. So a lot of the +work goes into cleaning those data sets. Now, as you pointed out, it could be the question is, +how do you achieve a ton of... If you want to basically predict in 3D, you need data in 3D +to back that up. So in this video, we have eight videos coming from all the cameras of the system. +And this is what they saw. And this is the truth of what actually was around. There was this car, +there was this car, this car. These are the lane line markings. This is the geometry of the road. +There was traffic light in this three-dimensional position. You need the ground truth. And so the +big question that the team was solving, of course, is how do you arrive at that ground truth? Because +once you have a million of it, and it's large, clean, and diverse, then training a neural net +on it works extremely well. And you can ship that into the car. And so there's many mechanisms by +which we collected that training data. You can always go for human annotation. You can go for +simulation as a source of ground truth. You can also go for what we call the offline tracker +that we've spoken about at the AI day and so on, which is basically an automatic reconstruction +process for taking those videos and recovering the three-dimensional reality of what was around +that car. So basically think of doing a three-dimensional reconstruction as an +offline thing, and then understanding that, okay, there's 10 seconds of video. This is what we saw. +And therefore, here's all the lane lines, cars, and so on. And then once you have that annotation, +you can train your neural net to imitate it. And how difficult is the three-D reconstruction? +It's difficult, but it can be done. So there's overlap between the cameras +and you do the reconstruction. And there's perhaps if there's any inaccuracy, +so that's caught in the annotation step. Yes. The nice thing about the annotation is that it is +fully offline. You have infinite time. You have a chunk of one minute and you're trying to just +offline in a supercomputer somewhere, figure out where were the positions of all the cars, +all the people, and you have your full one minute of video from all the angles. +And you can run all the neural nets you want, and they can be very efficient, massive neural nets. +There can be neural nets that can't even run in the car later at test time. So they can be even +more powerful neural nets than what you can eventually deploy. So you can do anything you +want, three-dimensional reconstruction, neural nets, anything you want just to recover that truth, +and then you supervise that truth. What have you learned? You said no mistakes about humans +doing annotation because I assume humans are... There's like a range of things they're good at +in terms of clicking stuff on screen. Isn't that... How interesting is that to you of a problem of +designing an annotator where humans are accurate, enjoy it? What are even the metrics? Are efficient +or productive, all that kind of stuff? Yeah. So I grew the annotation team at +Tesla from basically zero to a thousand while I was there. That was really interesting. My background +is a PhD student researcher, so growing that kind of an organization was pretty crazy. +But yeah, I think it's extremely interesting and part of the design process very much behind the +autopilot as to where you use humans. Humans are very good at certain kinds of annotations. +They're very good, for example, at two-dimensional annotations of images. They're not good at +annotating cars over time in three-dimensional space, very, very hard. And so that's why we're +very careful to design the tasks that are easy to do for humans versus things that should be left to +the offline tracker. Like maybe the computer will do all the triangulation and 3D reconstruction, +but the human will say exactly these pixels of the image are a car, exactly these pixels are human. +And so co-designing the data annotation pipeline was very much +bread and butter, was what I was doing daily. Do you think there's still a lot of open problems +in that space? Just in general, annotation where the stuff the machines are good at, +machines do and the humans do what they're good at, and there's maybe some iterative process. +Right. I think to a very large extent, we went through a number of iterations and we learned a +ton about how to create these data sets. I'm not seeing big open problems. Originally when I joined, +I was really not sure how this would turn out. But by the time I left, I was much more secure and +understand the philosophy of how to create these data sets. And I was pretty comfortable with +where that was at the time. So what are strengths and limitations of cameras for the driving task +in your understanding when you formulate the driving task as a vision task with eight cameras? +You've seen that the entire, most of the history of the computer vision field, +when it has to do with neural networks, just if you step back, what are the strengths and limitations +of pixels, of using pixels to drive? Yeah. Pixels I think are a beautiful sensor, +beautiful sensor, I would say. The thing is like cameras are very, very cheap and they provide a +ton of information, ton of bits. Also it's extremely cheap sensor for a ton of bits. And each one of +these bits is a constraint on the state of the world. And so you get lots of megapixel images, +very cheap. And it just gives you all these constraints for understanding what's actually +out there in the world. So vision is probably the highest bandwidth sensor. It's a very high +bandwidth sensor. I love that pixels is a constraint on the world. It's this highly complex, +high bandwidth constraint on the state of the world. And it's not just that, but again, this +real importance of it's the sensor that humans use. Therefore, everything is designed for that +sensor. The text, the writing, the flashing signs, everything is designed for vision. And so +you just find it everywhere. And so that's why that is the interface you want to be in, +talking again about these universal interfaces. And that's where we actually want to measure the +world as well and then develop software for that sensor. But there's other constraints on the state +of the world that humans use to understand the world. I mean, vision ultimately is the main one, +but we're referencing our understanding of human behavior and some common sense physics +that could be inferred from vision from a perception perspective. But it feels like +we're using some kind of reasoning to predict the world, not just the pixels. +I mean, you have a powerful prior service for how the world evolves over time, et cetera. So it's +not just about the likelihood term coming up from the data itself telling you about what you are +observing, but also the prior term of where are the likely things to see and how do they likely +move and so on. And the question is how complex is the range of possibilities that might happen +in the driving task? Is that to you still an open problem of how difficult is driving, +like philosophically speaking? All the time you worked on driving, do you understand how +hard driving is? Yeah, driving is really hard because it has to do with the predictions of +all these other agents and the theory of mind and what they're going to do and are they looking +at you? Where are they looking? Where are they thinking? There's a lot that goes there at the +full tail of the expansion of the knives that we have to be comfortable with eventually. +The final problems are of that form. I don't think those are the problems that are very common. +I think eventually they're important, but it's really in the tail end. +In the tail end, the rare edge cases. From the vision perspective, what are the toughest parts +of the vision problem of driving? Well, basically the sensor is extremely powerful, +but you still need to process that information. And so going from brightnesses of these special +values to, hey, here are the three-dimensional world is extremely hard. And that's what the +neural networks are fundamentally doing. And so the difficulty really is in just doing an extremely +good job of engineering the entire pipeline, the entire data engine, having the capacity to train +these neural nets, having the ability to evaluate the system and iterate on it. So I would say just +doing this in production at scale is like the hard part. It's an execution problem. +So the data engine, but also the deployment of the system such that it has low latency performance. +So it has to do all these steps. Yeah, for the neural net specifically, +just making sure everything fits into the chip on the car. And you have a finite budget of flops +that you can perform and memory bandwidth and other constraints. And you have to make sure it +flies and you can squeeze in as much computer as you can into the tiny. What have you learned from +that process? Because maybe that's one of the bigger, like new things coming from a research +background where there's a system that has to run under heavily constrained resources, +has to run really fast. What kind of insights have you learned from that? +Yeah, I'm not sure if there's too many insights. You're trying to create a neural net that will +fit in what you have available and you're always trying to optimize it. And we talked a lot about +it on the AI day and basically the triple backflips that the team is doing to make sure it all fits +and utilizes the engine. So I think it's extremely good engineering. And then there's all kinds of +little insights peppered in on how to do it properly. Let's actually zoom out because I +don't think we talked about the data engine, the entirety of the layouts of this idea that I think +is just beautiful with humans in the loop. Can you describe the data engine? Yeah, the data engine is +what I call the almost biological feeling like process by which you perfect the training sets +for these neural networks. So because most of the programming now is in the level of these data sets +and make sure they're large, diverse and clean. Basically, you have a data set that you think is +good. You train your neural net, you deploy it, and then you observe how well it's performing. +And you're trying to always increase the quality of your data set. So you're trying to catch +scenarios basically that are basically rare. And it is in these scenarios that the neural nets +will typically struggle in because they weren't told what to do in those rare cases in the data +set. But now you can close the loop because if you can now collect all those at scale, you can then +feed them back into the reconstruction process I described and reconstruct the truth in those cases +and add it to the data set. And so the whole thing ends up being like a staircase of improvement +of perfecting your training set. And you have to go through deployments so that you can mine +the parts that are not yet represented well in the data set. So your data set is basically imperfect. +It needs to be diverse. It has pockets that are missing and you need to pad out the pockets. You +can sort of think of it that way in the data. What role do humans play in this? So what's this +biological system? Like are human bodies made up of cells? What role, like how do you optimize the +human system? The multiple engineers collaborating, figuring out what to focus on, what to contribute, +which task to optimize in this neural network. Who is in charge of figuring out which task needs +more data? Can you speak to the hyperparameters of the human system? It really just comes down +to extremely good execution from an engineering team who knows what they're doing. They understand +intuitively the philosophical insights underlying the data engine and the process by which the +system improves and how to again, delegate the strategy of the data collection and how that +works and then just making sure it's all extremely well executed. And that's where most of the work +is not even the philosophizing or the research or the ideas of it. It's just extremely good +execution. It's so hard when you're dealing with data at that scale. So your role in the data engine +executing well on it is difficult and extremely important. Is there a priority of like a vision +board of saying like, we really need to get better at stoplights? Yeah. Like the prioritization of +tasks. Is that essentially, and that comes from the data? That comes to a very large extent to +what we are trying to achieve in the product for a map or the release we're trying to get out +in the feedback from the QA team where the system is struggling or not, the things we're +trying to improve. And the QA team gives some signal, some information in aggregate about the +performance of the system in various conditions. That's right. And then of course, all of us drive +it and we can also see it. It's really nice to work with a system that you can also experience +yourself and it drives you home. Is there some insight you can draw from your individual +experience that you just can't quite get from an aggregate statistical analysis of data? Yeah. +It's so weird, right? Yes. It's not scientific in a sense because you're just one anecdotal sample. +Yeah. I think there's a ton of, it's a source of truth. It's your interaction with the system +and you can see it, you can play with it, you can perturb it, you can get a sense of it, +you have an intuition for it. I think numbers just like have a way of, numbers and plots and graphs +are much harder. It hides a lot of- It's like if you train a language model, +it's a really powerful way is by you interacting with it. Yeah, 100%. +Try to build up an intuition. Yeah. I think like Ilan also, he always wanted to drive the system +himself. He drives a lot and I want to say almost daily. So he also sees this as a source of truth, +you driving the system and it performing and yeah. +So what do you think? Tough questions here. So Tesla last year removed radar from +the sensor suite and now just announced that it's going to remove ultrasonic sensors +relying solely on vision, so camera only. Does that make the perception problem harder or easier? +I would almost reframe the question in some way. So the thing is basically, +you would think that additional sensors- By the way, can I just interrupt? +Go ahead. I wonder if a language model will ever do that if you prompt it. Let me reframe your +question. That would be epic. That's the wrong prompt. Sorry. It's like a little bit of a wrong +question because basically you would think that these sensors are an asset to you. Yeah. But if +you fully consider the entire product in its entirety, these sensors are actually potentially +liability because these sensors aren't free. They don't just appear on your car. You need +suddenly you need to have an entire supply chain. You have people procuring it. There can be +problems with them. They may need replacement. They are part of the manufacturing process. They +can hold back the line in production. You need to source them. You need to maintain them. You have +to have teams that write the firmware, all of it. And then you also have to incorporate them, +fuse them into the system in some way. And so it actually like bloats a lot of it. And I think +Elon is really good at simplify, simplify. Best part is no part. And he always tries to throw away +things that are not essential because he understands the entropy in organizations and in the approach. +And I think in this case, the cost is high and you're not potentially seeing it if you're just a +computer vision engineer. And I'm just trying to improve my network and is it more useful or less +useful? How useful is it? And the thing is once you consider the full cost of a sensor, it actually +is potentially a liability. And you need to be really sure that it's giving you extremely useful +information. In this case, we looked at using it or not using it and the Delta was not massive. +And so it's not useful. Is it also bloat in the data engine? Like having more sensors? Is it +distraction? And these sensors, you know, they can change over time. For example, you can have one +type of say radar, you can have other type of radar. They change over time. Now you suddenly +need to worry about it. Now suddenly you have a column in your SQLite telling you, oh, what +sensor type was it? And they all have different distributions. And then they can, they just, +they contribute noise and entropy into everything. And they bloat stuff. And also organizationally +has been really fascinating to me that it can be very distracting. If you, if all, if you only +want to get to work is vision, all the resources are on it and you're building out a data engine +and you're actually making forward progress because that is the sensor with the most bandwidth, +the most constraints in the world. And you're investing fully into that. And you can make that +extremely good. If you're, you're only a finite amount of sort of spend of focus across different +facets of the system. And this kind of reminds me of Rich Sutton's, the bitter lesson. +It just seems like simplifying the system. Yeah. In the long run. And of course, you don't know +what the long run is. It seems to be always the right solution. Yeah. Yes. In that case, it was +for RL, but it seems to apply generally across all systems that do computation. Yeah. So where, +what do you think about the lidar as a crutch debate? The battle between point clouds and pixels. +Yeah. I think this debate is always like slightly confusing to me because it seems like the actual +debate should be about like, do you have the fleet or not? That's like the really important +thing about whether you can achieve a really good functioning of an AI system at this scale. So data +collection systems. Yeah. Do you have a fleet or not is significantly more important, whether you +have lidar or not. It's just another sensor. And yeah, I think similar to the radar discussion, +basically, I don't think it basically doesn't offer extra information. It's extremely costly. +It has all kinds of problems. You have to worry about it. You have to calibrate it, +et cetera. It creates bloat and entropy. You have to be really sure that you need this sensor. +In this case, I basically don't think you need it. And I think honestly, I will make a stronger +statement. I think the others, some of the other companies that are using it are probably going +to drop it. Yeah. So you have to consider the sensor in the full, in considering, can you build +a big fleet that collects a lot of data? And can you integrate that sensor with that data and that +sensor into a data engine that's able to quickly find different parts of the data that then +continuously improves whatever the model that you're using? Yeah. Another way to look at it is like +vision is necessary in the sense that the world is designed for human visual consumption. So you +need vision. It's necessary. And then also it is sufficient because it has all the information that +you need for driving and humans obviously has vision to drive. So it's both necessary and +sufficient. So you want to focus resources and you have to be really sure if you're going to +bring in other sensors. You could add sensors to infinity. At some point, you need to draw the line. +And I think in this case, you have to really consider the full cost of any one sensor. +That you're adopting and do you really need it? And I think the answer in this case is no. +So what do you think about the idea that the other companies are forming high resolution maps +and constraining heavily the geographic regions in which they operate? Is that approach not in your +view, not going to scale over time to the entirety of the United States? I think as you mentioned, +they pre-map all the environments and they need to refresh the map. And they have a perfect +centimeter level accuracy map of everywhere they're going to drive. It's crazy. We've been +talking about the autonomy actually changing the world. We're talking about the deployment +on a global scale of autonomous systems for transportation. And if you need to maintain +a centimeter accurate map for Earth or for many cities and keep them updated, it's a huge +dependency that you're taking on. Huge dependency. It's a massive, massive dependency. And now you +need to ask yourself, do you really need it? And humans don't need it. So it's very useful to have +a low level map of like, okay, the connectivity of your road. You know that there's a fork coming up. +When you drive an environment, you have that high level understanding. It's like a small Google map +and Tesla uses Google map, similar kind of resolution information in the system, but it +will not pre-map environments to send me a level of accuracy. It's a crutch. It's a distraction. +It costs entropy and it diffuses the team. It dilutes the team. And you're not focusing +on what's actually necessary, which is the computer vision problem. What did you learn +about machine learning, about engineering, about life, about yourself as one human being +from working with Elon Musk? I think the most I've learned is about how to sort of run organizations +efficiently and how to create efficient organizations and how to fight entropy in an organization. +So human engineering in the fight against entropy. Yeah. I think Elon is a very efficient warrior +in the fight against entropy in organizations. What does entropy in an organization look like? +It's process. It's process and inefficiencies in the form of meetings and that kind of stuff. +Yeah. Meetings. He hates meetings. He keeps telling people to skip meetings if they're not useful. +He basically runs the world's biggest startups, I would say. Tesla, SpaceX are the world's biggest +startups. Tesla actually has multiple startups. I think it's better to look at it that way. +And so I think he's extremely good at that. And yeah, he has a very good intuition for +streamlining processes, making everything efficient. Best part is no part, simplifying, focusing, +and just kind of removing barriers, moving very quickly, making big moves. +All of this is very startupy sort of seeming things, but at scale. +So strong drive to simplify. From your perspective, I mean, that also probably applies to just +designing systems and machine learning and otherwise. Like simplify, simplify. +Yes. What do you think is the secret to maintaining the startup culture in a company that grows? +Can you introspect that? +I do think you need someone in a powerful position with a big hammer like Elon, who's like +the cheerleader for that idea and ruthlessly pursues it. If no one has a big enough hammer, +everything turns into committees, democracy within the company, process, talking to stakeholders, +decision making, just everything just crumbles. If you have a big person who's also really smart +and has a big hammer, things move quickly. So you said your favorite scene in Interstellar +is the intense docking scene with the AI and Cooper talking, saying, +Cooper, what are you doing docking? It's not possible. No, it's necessary. Such a good line. +By the way, just so many questions there. Why an AI in that scene, presumably is supposed to be +able to compute a lot more than the human. It's saying it's not optimal. Why the human? I mean, +that's a movie, but shouldn't the AI know much better than the human? Anyway, what do you think +is the value of setting seemingly impossible goals? Our initial intuition, which seems like +something that you have taken on that Elon espouses, where the initial intuition of the +community might say this is very difficult and then you take it on anyway with a crazy deadline. +You just from a human engineering perspective, have you seen the value of that? +I wouldn't say that setting impossible goals exactly is a good idea, but I think setting very +ambitious goals is a good idea. I think there's what I call sub-linear scaling of difficulty, +which means that 10x problems are not 10x hard. Usually 10x harder problem is like 2 or 3x harder +to execute on. If you want to improve a system by 10%, it costs some amount of work. If you want to +10x improve the system, it doesn't cost 100x amount of work. It's because you fundamentally +change the approach. If you start with that constraint, then some approaches are obviously +dumb and not going to work. It forces you to reevaluate. I think it's a very interesting way +of approaching problem solving. It requires a weird kind of thinking. Going back to your PhD +days, how do you think which ideas in the machine learning community are solvable? +Yes. +It requires, what is that? There's the cliche of first principles thinking, but it requires +to basically ignore what the community is saying because doesn't a community in science usually +draw lines of what is and isn't possible? It's very hard to break out of that without going crazy. +I think a good example here is the deep learning revolution in some sense because you could +be in computer vision at that time during the deep learning revolution of 2012 and so on. +You could be improving a computer vision stack by 10% or you can just be saying, +actually all of this is useless. How do I do 10x better computer vision? Well, it's not probably +by tuning a hog feature detector. I need a different approach. I need something that is +scalable. Going back to Richard Sutton's understanding the philosophy of the bitter lesson +and then being like, actually I need much more scalable system like a neural network +that in principle works and then having some deep believers that can actually +execute on that mission and make it work. That's the 10x solution. +What do you think is the timeline to solve the problem of autonomous driving? +That's still in part an open question. +Yeah. I think the tough thing with timelines of self-driving obviously is that no one has created +self-driving. It's not like, what do you think is the timeline to build this bridge? Well, +we've built million bridges before. Here's how long that takes. No one has built autonomy. It's +not obvious. Some parts turn out to be much easier than others. It's really hard to forecast. You do +your best based on trend lines and so on and based on intuition, but that's why fundamentally it's +just really hard to forecast this. Even still being inside of it, it's hard to do. Yes. Some +things turn out to be much harder and some things turn out to be much easier. Do you try to avoid +making forecasts? Because Elon doesn't avoid them, right? Heads of car companies in the past have +not avoided it either. Ford and other places have made predictions that we're going to solve +at level four driving by 2020, 2021, whatever. They're all kind of backtracking that prediction. +Are you, as an AI person, do you for yourself privately make predictions or do they get in +the way of your actual ability to think about a thing? Yeah, I would say what's easy to say is +that this problem is tractable and that's an easy prediction to make. It's tractable. It's going to +work. Yes. It's just really hard. Some things turn out to be harder and some things turn out to be +easier. It definitely feels tractable and it feels like at least the team at Tesla, +which is what I saw internally, is definitely on track to that. How do you form a strong +representation that allows you to make a prediction about tractability? You're the leader of a lot of +humans. You have to say this is actually possible. How do you build up that intuition? It doesn't +have to be even driving. It could be other tasks. What difficult tasks did you work on in your life? +Classification, achieving certain, just an image net, certain level of superhuman level performance. +Yeah, expert intuition. It's just intuition. It's belief. +So just thinking about it long enough, studying, looking at sample data, like you said, driving. +My intuition is really flawed on this. I don't have a good intuition about tractability. +It could be anything. It could be solvable. The driving task could be +simplified into something quite trivial. The solution to the problem would be quite trivial. +At scale, more and more cars driving perfectly might make the problem much easier. The more +cars you have driving, people learn how to drive correctly, not correctly, but in a way that's more +optimal for a heterogeneous system of autonomous and semi-autonomous and manually driven cars. +That could change stuff. Then again, also I've spent a ridiculous number of hours just staring +at pedestrians crossing streets, thinking about humans. It feels like the way we use our eye +contact, it sends really strong signals. There's certain quirks and edge cases of behavior. Of +course, a lot of the fatalities that happen have to do with drunk driving and both on the +pedestrian side and the driver side. There's that problem of driving at night and all that kind of. +It's like the space of possible solutions to autonomous driving includes so many human factor +issues that it's almost impossible to predict. There could be super clean, nice solutions. +I would say definitely like to use a game analogy, there's some fog of war, +but you definitely also see the frontier of improvement. You can measure historically how +much you've made progress. I think, for example, at least what I've seen in roughly five years at +Tesla, when I joined, it barely kept lane on the highway. I think going up from Palo Alto to SF +was like three or four interventions. Anytime the road would do anything geometrically or turn too +much, it would just not work. Going from that to a pretty competent system in five years and seeing +what happens also under the hood and what the scale of which the team is operating now with +respect to data and compute and everything else is just massive progress. You're climbing a mountain +and it's fog, but you're making a lot of progress. It's fog. You're making progress and you see what +the next directions are and you're looking at some of the remaining challenges and they're not +perturbing you and they're not changing your philosophy and you're not contorting yourself. +You're like, actually, these are the things that we still need to do. Yeah, the fundamental +components of solving the problem seem to be there from the data engine to the compute to the +compute on the car to the compute for the training, all that kind of stuff. +So you've done over the years, you've been at Tesla, you've done a lot of amazing +breakthrough ideas and engineering, all of it from the data engine to the human side, all of it. +Can you speak to why you chose to leave Tesla? Basically, as I described that, Ren, I think over +time during those five years, I've gotten myself into a bit of a managerial position. +Most of my days were meetings and growing the organization and making decisions about high +level strategic decisions about the team and what it should be working on and so on. It's like a +corporate executive role and I can do it. I think I'm okay at it, but it's not fundamentally what I +enjoy. I think when I joined, there was no computer vision team because Tesla was just going from the +transition of using Mobileye, a third party vendor for all of its computer vision, to having to +build its computer vision system. So when I showed up, there were two people training deep neural +networks and they were training them at a computer at their legs. They were doing some kind of basic +classification task. Yeah. And so I kind of grew that into what I think is a fairly respectable +deep learning team, a massive compute cluster, a very good data annotation organization. +And I was very happy with where that was. It became quite autonomous. And so I kind of +stepped away and I'm very excited to do much more technical things again. Yeah. And kind of like, +we focus on AGI. What was that soul searching like? Cause you took a little time off and think like +what, how many mushrooms did you take? No, I'm just kidding. I mean, what was going through your mind? +The human lifetime is finite. Yeah. You did a few incredible things here. You're one of the best +teachers of AI in the world. You're one of the best. And I don't mean that I mean that in the +best possible way. You're one of the best tinkerers in the AI world, meaning like understanding the +fundamentals of how something works by building it from scratch and playing with it with the +basic intuitions. It's like Einstein, Feynman, we're all really good at this kind of stuff. +Like small example of a thing to play with it, to try to understand it. So that, and obviously now +with Tessa, you helped build a team of machine learning, like engineers and assistant that +actually accomplishes something in the real world. So given all that, like what was the soul searching +like? Well, it was hard because obviously I love the company a lot and I love Elon, I love Tesla. +It was always hard to leave. I love the team basically. But yeah, I think actually I will be +potentially like interested in revisiting it. Maybe coming back at some point, +working in Optimus, working in AGI at Tesla. I think Tesla is going to do incredible things. +It's basically like, it's a massive large scale robotics kind of company with a ton of in-house +talent for doing really incredible things. And I think human robots are going to be amazing. +I think autonomous transportation is going to be amazing. All this is happening at Tesla. So I +think it's just a really amazing organization. So being part of it and helping it along, I think +was very, basically I enjoyed that a lot. Yeah, it was basically difficult for those reasons because +I love the company. But I'm happy to potentially at some point come back for Act 2. But I felt +like at this stage, I built the team, it felt autonomous and I became a manager and I wanted +to do a lot more technical stuff. I wanted to learn stuff. I wanted to teach stuff. And I just +kind of felt like it was a good time for a change of pace a little bit. What do you think is +the best movie sequel of all time, speaking of part two? Because most of them suck. Movie sequels? +Movie sequels, yeah. And you tweet about movies. So just in a tiny tangent, +what's a favorite movie sequel? Godfather part two. Are you a fan of Godfather? Because you +didn't even tweet or mention the Godfather. Yeah, I don't love that movie. I know it has a +huge follow-up. We're going to edit that out. We're going to edit out the hate towards the Godfather. +How dare you disrespect- I think I will make a strong statement. I don't know why. +I don't know why, but I basically don't like any movie before 1995. Something like that. +Didn't you mention Terminator 2? Okay. Okay. That's like Terminator 2 was +a little bit later, 1990. No, I think Terminator 2 was in the 80s. +And I like Terminator 1 as well. So, okay. So like few exceptions, but by and large, +for some reason, I don't like movies before 1995 or something. They feel very slow. The camera is +like zoomed out. It's boring. It's kind of naive. It's kind of weird. +And also Terminator was very much ahead of its time. +Yes. And the Godfather, there's like no AGI. +I mean, but you have Good Will Hunting was one of the movies you mentioned, +and that doesn't have any AGI either. I guess it has mathematics. +Yeah. I guess occasionally I do enjoy movies that don't feature- +Or like Anchorman. That's- Anchorman is so good. +I don't understand. Speaking of AGI, because I don't understand why Will Ferrell is so funny. +It doesn't make sense. It doesn't compute. There's just something about him. +And he's a singular human because you don't get that many comedies these days. And I wonder if +it has to do about the culture or the machine of Hollywood, or does it have to do with just +we got lucky with certain people in comedy. It came together because he is a singular human. +Yeah. I like his movies. +That was a ridiculous tangent. I apologize. But you mentioned humanoid robots. So what do you +think about Optimus, about Tesla Bot? Do you think we'll have robots in the factory and in the home +in 10, 20, 30, 40, 50 years? Yeah. I think it's a very hard project. +I think it's going to take a while. But who else is going to build humanoid robots at scale? +And I think it is a very good form factor to go after because like I mentioned, +the world is designed for humanoid form factor. These things would be able to operate our machines. +They would be able to sit down in chairs, potentially even drive cars. Basically, +the world is designed for humans. That's the form factor you want to invest into and make work over +time. I think there's another school of thought, which is, okay, pick a problem and design a robot +to it. But actually designing a robot and getting a whole data engine and everything behind it to +work is actually an incredibly hard problem. So it makes sense to go after general interfaces +that, okay, they are not perfect for any one given task, but they actually have the generality +of just with a prompt with English, able to do something across. And so I think it makes a lot +of sense to go after a general interface in the physical world. And I think it's a very +difficult project. I think it's going to take time. But I see no other company that can execute on +that vision. I think it's going to be amazing. Basically physical labor. If you think transportation +is a large market, try physical labor. It's insane. But it's not just physical labor. To me, +the thing that's also exciting is social robotics. So the relationship we'll have on different levels +with those robots. That's why I was really excited to see Optimus. People have criticized me +for the excitement. But I've worked with a lot of research labs that do humanoid-legged robots, +Boston Dynamics, Unitree. There's a lot of companies that do legged robots. +But the elegance of the movement is a tiny, tiny part of the big picture. So integrating the two +big exciting things to me about Tesla doing humanoid or any legged robots is clearly integrating +into the data engine. So the data engine aspect, so the actual intelligence for the perception and +the control and the planning and all that kind of stuff, integrating into the fleet that you +mentioned. And then speaking of fleet, the second thing is the mass manufacturers. Just knowing +culturally driving towards a simple robot that's cheap to produce at scale and doing that well, +having experience to do that well, that changes everything. That's a very different culture +and style than Boston Dynamics, who by the way, those robots are just the way they move. +It'll be a very long time before Tesla can achieve the smoothness of movement, +but that's not what it's about. It's about the entirety of the system, like we talked about, +the data engine and the fleet. That's super exciting. Even the initial models. But that, +too, was really surprising that in a few months you can get a prototype. +The reason that happened very quickly is, as you alluded to, there's a ton of copy paste from +what's happening on the autopilot. A lot. The amount of expertise that came out of the woodworks +at Tesla for building the human robot was incredible to see. Basically, Elon said at one +point, we're doing this. And then next day, basically, all these CAD models started to appear. +People talk about the supply chain and manufacturing. People showed up with +screwdrivers and everything the other day and started to put together the body. I was like, +whoa. All these people exist at Tesla. Fundamentally, building a car is actually +not that different from building a robot. That is true, not just for the hardware pieces. Also, +let's not forget hardware, not just for a demo, but manufacturing of that hardware at scale. +It is a whole different thing. But for software as well, basically, this robot currently thinks +it's a car. It's going to have a midlife crisis at some point. It thinks it's a car. Some of the +earlier demos, actually, we were talking about potentially doing them outside in the parking lot +because that's where all of the computer vision was working out of the box instead of inside. +All the operating system, everything just copy pastes. Computer vision mostly copy pastes. You +have to retrain the neural nets, but the approach and everything and data engine and offline +trackers and the way we go about the occupancy tracker and so on, everything copy pastes. You +just need to retrain the neural nets. Then the planning control, of course, has to change quite +a bit. But there's a ton of copy paste from what's happening at Tesla. If you were to go +with the goal of like, okay, let's build a million human robots and you're not Tesla, +that's a lot to ask. If you're Tesla, it's actually like, it's not that crazy. +Yes. Then the follow-up question is then how difficult, just like with driving, +how difficult is the manipulation task such that it can have an impact at scale? I think +depending on the context, the really nice thing about robotics is that unless you do a +manufacturing and that kind of stuff, is there's more room for error. Driving is so safety critical +and also time critical. A robot is allowed to move slower, which is nice. +Yes. I think it's going to take a long time, but the way you want to structure the development is +you need to say, okay, it's going to take a long time. How can I set up the product development +roadmap so that I'm making revenue along the way? I'm not setting myself up for a zero one +loss function where it doesn't work until it works. You don't want to be in that position. +You want to make it useful almost immediately, and then you want to slowly deploy it +and at scale. And you want to set up your data engine, your improvement loops, the telemetry, +the evaluation, the harness and everything. And you want to improve the product over time +incrementally and you're making revenue along the way. That's extremely important because otherwise +you cannot build these large undertakings just like don't make sense economically. +And also from the point of view of the team working on it, they need the dopamine along the way. +They're not just going to make a promise about this being useful. This is going to change the +world in 10 years when it works. This is not where you want to be. You want to be in a place +like I think Autopilot is today where it's offering increased safety and convenience of driving today. +People pay for it. People like it. People will purchase it. And then you also have the greater +mission that you're working towards. And you see that. So the dopamine for the team, +that was a source of happiness and satisfaction. Yes, 100%. You're deploying this. People like it. +People drive it. People pay for it. They care about it. There's all these YouTube videos. +Your grandma drives it. She gives you feedback. People like it. People engage with it. You engage +with it. Huge. Do people that drive Teslas recognize you and give you love? Like, hey, thanks for this +nice feature that it's doing. Yeah, I think the tricky thing is like some people really love you. +Some people, unfortunately, like you're working on something that you think is extremely valuable, +useful, etc. Some people do hate you. There's a lot of people who like me and the team and +the whole project. And I think Tesla drivers, many cases they're not actually. Yeah, that's +actually makes me sad about humans or the current ways that humans interact. I think that's actually +fixable. I think humans want to be good to each other. I think Twitter and social media is part +of the mechanism that actually somehow makes the negativity more viral, that it doesn't deserve +disproportionately add a viral boost to the negativity. But I wish people would just get +excited about, so suppress some of the jealousy, some of the ego and just get excited for others. +And then there's a karma aspect to that. You get excited for others, they'll get excited for you. +Same thing in academia. If you're not careful, there is a dynamical system there. +If you think of in silos and get jealous of somebody else being successful, that actually, +perhaps counterintuitively, leads to less productivity of you as a community and you +individually. I feel like if you keep celebrating others, that actually makes you more successful. +Yeah. I think people haven't, depending on the industry, haven't quite learned that yet. +Some people are also very negative and very vocal. They're very prominently featured, +but actually there's a ton of people who are cheerleaders, but they're silent cheerleaders. +And when you talk to people just in the world, they will tell you, it's amazing, it's great. +Especially people who understand how difficult it is to get this stuff working. People who have +built products and makers, entrepreneurs, making this work and changing something +is incredibly hard. Those people are more likely to cheerlead you. +Well, one of the things that makes me sad is some folks in the robotics community +don't do the cheerleading and they should because they know how difficult it is. Well, +they actually sometimes don't know how difficult it is to create a product that's scale. They +actually deploy it in the real world. A lot of the development of robots and AI system is done on +very specific small benchmarks as opposed to real world conditions. +Yes. Yeah. I think it's really hard to work on robotics in an academic setting. +Or AI systems that apply in the real world. You've criticized, you flourished and loved for time the +ImageNet, the famed ImageNet data set. And I've recently had some words of criticism that the +academic research ML community gives a little too much love still to the ImageNet or like +those kinds of benchmarks. Can you speak to the strengths and weaknesses of data sets +used in machine learning research? Actually, I don't know that I recall +a specific instance where I was unhappy or criticizing ImageNet. I think ImageNet has +been extremely valuable. It was basically a benchmark that allowed the deep learning community +to demonstrate that deep neural networks actually work. There's a massive value in that. +I think ImageNet was useful, but basically it's become a bit of an MNIST at this point. +MNIST is like little 228 by 28 grayscale digits. There's a joke data set that everyone just crushes. +There's still papers written on MNIST though, right? +Maybe they shouldn't. +Strong papers. Like papers that focus on how do we learn with a small amount of data, that kind of +stuff. Yeah. I could see that being helpful, but not in mainline computer vision research anymore, +of course. I think the way I've heard you somewhere, maybe I'm just imagining things, +but I think you said ImageNet was a huge contribution to the community for a long time, +and now it's time to move past those kinds of... Well, ImageNet has been crushed. I'm +the error rates are... Yeah, we're getting like 90% accuracy in 1000 classification way prediction, +and I've seen those images and it's like really high. That's really good. If I remember correctly, +the top five error rate is now like 1% or something. +Given your experience with a gigantic real world data set, would you like to see benchmarks move +in a certain directions that the research community uses? +Unfortunately, I don't think academics currently have the next ImageNet. +I think we've crushed MNIST. We've basically crushed ImageNet, and there's no next big +benchmark that the entire community rallies behind and uses for further development of these +networks. Yeah. What are what it takes for a data set to captivate the imagination of everybody, +where they all get behind it? That could also need a leader, right? Yeah. Somebody with popularity. +Yeah. Why did ImageNet take off? Or is it just the accident of history? +It was the right amount of difficult. It was the right amount of difficult and simple, +and interesting enough, it just kind of like it was the right time for that kind of a data set. +Question from Reddit. What are your thoughts on the role that synthetic data and game engines +will play in the future of neural net model development? I think as neural nets converge +to humans, the value of simulation to neural nets will be similar to the value of simulation to +humans. So people use simulation because they can learn something in that kind of a system +without having to actually experience it. But are you referring to the simulation we do in our head? +No, sorry, simulation. I mean like video games or other forms of simulation for various professionals. +So let me push back on that because maybe there's simulation that we do in our heads. +Like, simulate if I do this, what do I think will happen? +Okay. That's like internal simulation. Yeah. Internal. Isn't that what we're doing? +Assuming before we act? Oh yeah. But that's independent from like the use of simulation in +the sense of like computer games or using simulation for training set creation or- +Is it independent or is it just loosely correlated? Because like, isn't that useful to do like +counterfactual or like edge case simulation to like, you know, what happens if there's a nuclear war? +What happens if there's, you know, like those kinds of things? +Yeah, that's a different simulation from like Unreal Engine. That's how I interpreted the question. +Ah, so like simulation of the average case. What's Unreal Engine? +What do you mean by Unreal Engine? So simulating a world, physics of that world, +why is that different? Like, because you also can add behavior to that world +and you could try all kinds of stuff, right? You could throw all kinds of weird things into it. +Unreal Engine is not just about simulating, I mean, I guess it is about simulating the physics +of the world. It's also doing something with that. Yeah. The graphics, the physics, and the +agents that you put into the environment and stuff like that. Yeah. See, I think you, +I feel like you said that it's not that important, I guess, for the future of AI development. +Is that correct to interpret it that way? I think humans use simulators for, +humans use simulators and they find them useful. And so computers will use simulators and find them +useful. Okay. So you're saying it's not that, I don't use simulators very often. I play a video +game every once in a while, but I don't think I derive any wisdom about my own existence from +those video games. It's a momentary escape from reality versus a source of wisdom about reality. +So I think that's a very polite way of saying simulation is not that useful. +Yeah, maybe not. I don't see it as like a fundamental, really important part of like +training neural nets currently. But I think as neural nets become more and more powerful, +I think you will need fewer examples to train additional behaviors. And simulation is, of course, +there's a domain gap in a simulation that it's not the real world, it's slightly something different. +But with a powerful enough neural net, you need, the domain gap can be bigger, I think, +because neural net will sort of understand that even though it's not the real world, it like has +all this high level structure that I'm supposed to be learning from. So the neural net will actually, +yeah, it will be able to leverage the synthetic data better by closing the gap, +by understanding in which ways this is not real data. +Exactly. +Right, I do better questions next time. That was a question, but I'm just kidding. All right. +So is it possible, do you think, speaking of MNIST, to construct neural nets and training +processes that require very little data? So we've been talking about huge data sets like +the internet for training. I mean, one way to say that is, like you said, like the querying itself +is another level of training, I guess, and that requires a little data. But do you see any value +in doing research and kind of going down the direction of can we use very little data to train, +to construct a knowledge base? +100%. I just think like at some point you need a massive data set. And then when you pre-train +your massive neural net and get something that is like a GPT or something, then you're able to be +very efficient at training any arbitrary new task. So a lot of these GPTs, you can do tasks like +sentiment analysis or translation or so on just by being prompted with very few examples. Here's the +kind of thing I want you to do. Here's an input sentence, here's the translation into German. +Input sentence, translation to German. Input sentence blank, and the neural net will complete +the translation to German just by looking at sort of the example you've provided. And so that's an +example of a very few shot learning in the activations of the neural net instead of the +weights of the neural net. And so I think basically just like humans, neural nets will become very +data efficient at learning any other new task. But at some point you need a massive data set to +pre-train your network. To get that, and probably we humans have something like that. +Do we have something like that? Do we have a passive in the background model constructing +thing that just runs all the time in a self-supervised way? We're not conscious of it. +I think humans definitely, I mean, obviously we learn a lot during our lifespan, but also we have +a ton of hardware that helps us at initialization coming from sort of evolution. And so I think +that's also a really big component. A lot of people in the field, I think they just talk about +the amounts of like seconds and the, you know, that a person has lived pretending that this is +a WLRRSA, sort of like a zero initialization of a neural net. And it's not like you can look at a +lot of animals, like for example, zebras, zebras get born and they see and they can run. There's +zero train data in their lifespan. They can just do that. So somehow I have no idea how evolution +has found a way to encode these algorithms and these neural net initializations that are extremely +good into ATCGs. And I have no idea how this works, but apparently it's possible because +here's a proof by existence. There's something magical about going from a single cell to an +organism that is born to the first few years of life. I kind of like the idea that the reason we +don't remember anything about the first few years of our life is that it's a really painful process. +Like it's a very difficult, challenging training process. Like intellectually, like +and maybe, yeah, I mean, I don't, why don't we remember any of that? There might be some crazy +training going on and that maybe that's the background model training that is very painful. +And so it's best for the system once it's trained not to remember how it's constructed. +I think it's just like the hardware for long-term memory is just not fully developed. +I kind of feel like the first few years of infants is not actually like learning, +it's brain maturing. We're born premature. There's a theory along those lines because of the +birth canal and the swallowing of the brain. And so we're born premature and then the first few +years we're just, the brain is maturing and then there's some learning eventually. +That's my current view on it. What do you think, do you think neural nets can have long-term memory? +Like that approach is something like humans. Do you think they need to be another meta +architecture on top of it to add something like a knowledge base that learns facts about the world +and all that kind of stuff? Yes, but I don't know to what extent it will be explicitly constructed. +It might take unintuitive forms where you are telling the GPT like, hey, you have a declarative +memory bank to which you can store and retrieve data from. And whenever you encounter some +information that you find useful, just save it to your memory bank. And here's an example of +something you have retrieved and how you say it and here's how you load from it. You just say load, +whatever, you teach it in text, in English, and then it might learn to use a memory bank from that. +Oh, so the neural net is the architecture for the background model, the base thing, +and then everything else is just on top of it. That's pretty easy to do. +It's not just text, right? You're giving it gadgets and gizmos. So you're teaching some kind +of a special language by which it can save arbitrary information and retrieve it at a later +time. And you're telling it about these special tokens and how to arrange them to use these +interfaces. It's like, hey, you can use a calculator. Here's how you use it. Just do +53 plus 41 equals. And when equals is there, a calculator will actually read out the answer +and you don't have to calculate it yourself. And you just tell it in English, this might actually +work. Do you think in that sense, Godot is interesting, the DeepMind system, that it's not +just new language, but actually throws it all in the same pile, images, actions, all that kind +of stuff. That's basically what we're moving towards. Yeah, I think so. So Godot is very much a +kitchen sink approach to reinforcement learning in lots of different environments with a single +fixed transformer model, right? I think it's a very early result in that realm, but I think, +yeah, it's along the lines of what I think things will eventually look like. +So this is the early days of a system that eventually will look like this from a +rich, sudden perspective. Yeah, I'm not super huge fan of, I think, all these interfaces that +look very different. I would want everything to be normalized into the same API. So for example, +screen pixels, very same API, instead of having different world environments that have very +different physics and joint configurations and appearances and whatever, and you're having some +kind of special tokens for different games that you can plug. I'd rather just normalize everything +to a single interface so it looks the same to the neural net, if that makes sense. So it's all +going to be pixel-based pong in the end. I think so. Okay. Let me ask you about your own personal +life. A lot of people want to know you're one of the most productive and brilliant people +in the history of AI. What is a productive day in the life of Andrej Kapati look like? +What time do you wake up? Because imagine some kind of dance between the average productive day +and a perfect productive day. So the perfect productive day is the thing we strive towards, +and the average is what it converges to, given all the mistakes and human eventualities and so on. +So what time do you wake up? Are you a morning person? I'm not a morning person. I'm a night +owl for sure. Is it stable or not? It's semi-stable, like eight or nine or something like that. +During my PhD, it was even later, I used to go to sleep usually at 3 a.m. I think the a.m. hours +are precious and very interesting time to work because everyone is asleep. +At 8 a.m. or 7 a.m., the East Coast is awake. So there's already activity, there's already some +text messages, whatever, there's stuff happening. You can go on some news website and there's stuff +happening. It's distracting. At 3 a.m., everything is totally quiet. And so you're not going to be +bothered and you have solid chunks of time to do work. So I like those periods, night owl by +default. And then I think like productive time, basically, what I like to do is you need to build +some momentum on the problem without too much distraction. And you need to load your RAM, +your working memory with that problem. And then you need to be obsessed with it when you're taking +shower, when you're falling asleep. You need to be obsessed with the problem and it's fully in +your memory and you're ready to wake up and work on it right there. So is this in a scale, temporal +scale of a single day or a couple of days, a week, a month? So I can't talk about one day, +basically, in isolation because it's a whole process. When I want to get productive in the +problem, I feel like I need a span of a few days where I can really get in on that problem. And I +don't want to be interrupted. And I'm going to just be completely obsessed with that problem. +And that's where I do most of my good workouts. You've done a bunch of cool, like little projects +in a very short amount of time very quickly. So that requires you just focusing on it. +Yeah, basically, I need to load my working memory with the problem. And I need to be productive +because there's always a huge fixed cost to approaching any problem. I was struggling with +this, for example, at Tesla because I want to work on small side project. But okay, you first need to +figure out, okay, I need to SSH into my cluster. I need to bring up a VS code editor so I can work +on this. I run into some stupid error because of some reason. You're not at a point where you can +be just productive right away. You are facing barriers. And so it's about really removing all +of that barrier and you're able to go into the problem and you have the full problem loaded in +your memory. And somehow avoiding distractions of all different forms, like news stories, emails, +but also distractions from other interesting projects that you previously worked on or +currently working on and so on. You just want to really focus your mind. And I mean, I can take +some time off for distractions and in between, but I think it can't be too much. Most of your day is +sort of spent on that problem. And then I drink coffee, I have my morning routine, I look at some +news, Twitter, Hacker News, Wall Street Journal, et cetera. It's great. So basically, you wake up, +you have some coffee. Are you trying to get to work as quickly as possible? Are you taking this diet +of what the hell is happening in the world first? I do find it interesting to know about the world. +I don't know that it's useful or good, but it is part of my routine right now. So I do read through +a bunch of news articles and I want to be informed. And I'm suspicious of it. I'm suspicious of the +practice, but currently that's where I am. Oh, you mean suspicious about the positive effect +of that practice on your productivity and your wellbeing? My wellbeing psychologically, yeah. +And also on your ability to deeply understand the world because there's a bunch of sources of +information. You're not really focused on deeply integrating. Yeah, it's a little distracting. +In terms of a perfectly productive day, for how long of a stretch of time in one session do you +try to work and focus on a thing? A couple of hours, is it one hour, is it 30 minutes, is it +10 minutes? I can probably go a small few hours and then I need some breaks in between for food +and stuff. Yeah, but I think it's still really hard to accumulate hours. I was using a tracker +that told me exactly how much time I spent coding any one day. And even on a very productive day, +I still spent only like six or eight hours. And it's just because there's so much padding, +commute, talking to people, food, et cetera. There's like the cost of life, just living +and sustaining and homeostasis and just maintaining yourself as a human is very high. +And there seems to be a desire within the human mind to participate in society that creates that +padding. Because the most productive days I've ever had is just completely from start to finish +is tuning out everything and just sitting there. And then you could do more than six and eight +hours. Is there some wisdom about what gives you strength to do tough days of long focus? +Yeah, just like whenever I get obsessed about a problem, something just needs to work, +something just needs to exist. It needs to exist. So you're able to deal with bugs and programming +issues and technical issues and design decisions that turn out to be the wrong ones. You're able +to think through all of that given that you want to think to exist. Yeah, it needs to exist. And +then I think to me also a big factor is are other humans are going to appreciate it? Are they going +to like it? That's a big part of my motivation. If I'm helping humans and they seem happy, +they say nice things, they tweet about it or whatever, that gives me pleasure because I'm +doing something useful. So you do see yourself sharing it with the world. Whether it's on GitHub +or through a blog post or through videos. Yeah, I was thinking about it. Suppose I did all these +things but did not share them. I don't think I would have the same amount of motivation that +I can build up. You enjoy the feeling of other people gaining value and happiness from the stuff +you've created. Yeah. What about diet? I saw you played with intermittent fasting. Do you fast? +Does that help? I played with everything. +With the things you played, what's been most beneficial to your ability to mentally focus +on a thing and just mental productivity and happiness? You still fast? Yeah, I still fast, +but I do intermittent fasting. But really what it means at the end of the day is I skip breakfast. +So I do 18, 6 roughly by default when I'm in my steady state. If I'm traveling or doing something +else, I will break the rules. But in my steady state, I do 18, 6. So I eat only from 12 to 6. +Not a hard rule and I break it often, but that's my default. And then yeah, I've done a bunch of +random experiments. For the most part right now, where I've been for the last year and a half, +I want to say, is I'm plant-based or plant-forward. I heard plant-forward. It sounds better. +What does that mean exactly? I don't actually know what the difference is, +but it sounds better in my mind. But it just means I prefer plant-based food. +Raw or cooked? I prefer cooked and plant-based. +So plant-based, forgive me, I don't actually know how wide the category of plant entails. +Well, plant-based just means that you're not militant about it and you can flex. +You just prefer to eat plants and you're not trying to influence other people. +And if you come to someone's house party and they serve you a steak that they're really proud of, +you will eat it. That's beautiful. I'm on the flip side of that, but I'm very sort of flexible. +Have you tried doing one meal a day? I have accidentally, not consistently, +but I've accidentally had that. I don't like it. I think it makes me feel not good. It's too much, +too much of a hit. Yeah. +And so currently I have about two meals a day, 12 and six. +I do that nonstop. I'm doing it now. I do one meal a day. +It's interesting. It's an interesting feeling. Have you ever fasted longer than a day? +Yeah, I've done a bunch of water fasts because I was curious what happens. +Anything interesting? Yeah, I would say so. I mean, +what's interesting is that you're hungry for two days and then starting day three or so, +you're not hungry. It's such a weird feeling because you haven't eaten in a few days and +you're not hungry. Isn't that weird? +It's really weird. One of the many weird things about human biology, +is figure something out. It finds another source of energy or something like that, +or relaxes the system. I don't know how that works. +The body is like, you're hungry, you're hungry. And then it just gives up. It's like, +okay, I guess we're fasting now. There's nothing. And then it just focuses on trying to make you +not hungry and not feel the damage of that and trying to give you some space to figure out the +food situation. Are you still to this day most productive at night? +I would say I am, but it is really hard to maintain my PhD schedule, +especially when I was working at Tesla and so on. It's a non-starter. +But even now, people want to meet for various events. Society lives in a certain period of time +and you sort of have to work with that. +It's hard to do a social thing and then after that return and do work. +Yeah. It's just really hard. +That's why I try when I do social things, I try not to do too much drinking so I can return +and continue doing work. But at Tesla, is there a convergence, Tesla, but any company, +is there a convergence towards a schedule? Or is there more? Is that how humans behave +when they collaborate? I need to learn about this. Do they try to keep a consistent schedule +where you're all awake at the same time? I do try to create a routine and I try to +create a steady state in which I'm comfortable in. I have a morning routine, I have a day routine, +I try to keep things to a steady state and things are predictable. And then your body just +sticks to that. And if you try to stress that a little too much, it will create, +when you're traveling and you're dealing with jet lag, you're not able to really ascend +to where you need to go. Yeah. That's what you're doing with humans with the habits and stuff. +What are your thoughts on work-life balance throughout a human lifetime? +So Tesla in part was known for pushing people to their limits in terms of what they're able to do, +in terms of what they're trying to do, in terms of how much they work, all that kind of stuff. +Yeah. I will say Tesla gets a little too much bad rep for this because what's happening is Tesla, +it's a bursty environment. So I would say the baseline, my only point of reference is Google, +where I've interned three times and I saw what it's like inside Google and DeepMind. I would +say the baseline is higher than that, but then there's a punctuated equilibrium where once in +a while there's a fire and people work really hard. And so it's spiky and bursty and then all +the stories get collected. About the bursts. And then it gives the appearance of total insanity, +but actually it's just a bit more intense environment and there are fires and sprints. +And so I think definitely though I would say it's a more intense environment than something +you would get. But in your personal, forget all of that, just in your own personal life, +what do you think about the happiness of a human being? A brilliant person like yourself, +about finding a balance between work and life or is it such a thing, not a good thought experiment? +Yeah, I think balance is good, but I also love to have sprints that are out of distribution. +And that's when I think I've been pretty creative as well. Sprints out of distribution means that +most of the time you have a quote unquote balance. I have balance most of the time. +I like being obsessed with something once in a while. Once in a while is what? Once a week, +once a month, once a year? Yeah, probably like say once a month or something. Yeah. +And that's when we get a new GitHub repo for monitoring. Yeah, that's when you really care +about a problem. It must exist. This will be awesome. You're obsessed with it. And now you +can't just do it on that day. You need to pay the fixed cost of getting into the groove. And then +you need to stay there for a while and then society will come and they will try to mess with you and +they will try to distract you. Yeah. The worst thing is a person who's like, I just need five +minutes of your time. Yeah. The cost of that is not five minutes and society needs to change how +it thinks about it. Just five minutes of your time. Right. It's never just one minute. Just +30 seconds. Just a quick thing. What's the big deal? Why are you being so... Yeah, no. +What's your computer setup? What's like the perfect... Are you somebody that's flexible +to no matter what? Laptop, four screens. Yeah. Or do you prefer a certain setup that you're most +productive? I guess the one that I'm familiar with is one large screen, 27 inch, and my laptop +on the side. What operating system? I do Macs. That's my primary. For all tasks? I would say +OS X, but when you're working on deep learning, everything is Linux. You're SSH'd into a cluster +and you're working remotely. But what about the actual development? Like they're using the IDE? +I think a good way is you just run VS code, my favorite editor right now, on your Mac, +but you have a remote folder through SSH. The actual files that you're manipulating +are on the cluster somewhere else. What's the best IDE? VS code. What else do people... I use +Emacs still. That's cool. It may be cool. I don't know if it's maximum productivity. +What do you recommend in terms of editors? You worked a lot of software engineers. Editors for +Python, C++, machine learning applications. I think the current answer is VS code. Currently, +I believe that's the best IDE. It's got a huge amount of extensions. It has GitHub Copilot +integration, which I think is very valuable. What do you think about the Copilot integration? I +was actually... I got to talk a bunch with Guido Narrazzon, who's a creative Python, and he loves +Copilot. He programs a lot with it. Do you? Yeah, I use Copilot. I love it. It's free for me, +but I would pay for it. Yeah, I think it's very good. The utility that I found with it was... +I would say there's a learning curve, and you need to figure out when it's helpful and when to pay +attention to its outputs and when it's not going to be helpful, where you should not pay attention +to it. Because if you're just reading at suggestions all the time, it's not a good way of interacting +with it. But I think I was able to mold myself to it. I find it's very helpful, number one, +copy, paste, and replace some parts. When the pattern is clear, it's really good at completing +the pattern. And number two, sometimes it suggests APIs that I'm not aware of. It tells you about +something that you didn't know. And that's an opportunity to discover and use it again. +It's an opportunity to... I would never take Copilot code as given. I almost always copy +a copy paste into a Google search, and you see what this function is doing. And then you're like, +oh, it's actually exactly what I need. Thank you, Copilot. So you learn something. It's in part a +search engine, part maybe getting the exact syntax correctly that once you see it, it's that NP +hard thing. Once you see it, you know it's correct, but you yourself struggle. You can verify +efficiently, but you can't generate efficiently. And Copilot really, I mean, it's autopilot for +programming, right? And currently it's doing the link following, which is like the simple copy, +paste, and sometimes suggest. But over time, it's going to become more and more autonomous. +And so the same thing will play out in not just coding, but actually across many, +many different things probably. Coding is an important one, right? Like writing programs. +How do you see the future of that developing? The program synthesis, like being able to write +programs that are more and more complicated. Because right now it's human supervised in +interesting ways. It feels like the transition will be very painful. +My mental model for it is the same thing will happen as with the autopilot. So currently +it's doing link following, it's doing some simple stuff. And eventually we'll be doing autonomy and +people will have to intervene less and less. And there could be like testing mechanisms. +Like if it writes a function and that function looks pretty damn correct, but how do you know +it's correct? Because you're getting lazier and lazier as a programmer. Like your ability to, +because like little bugs, but I guess it won't make little mistakes. +No, it will. Copilot will make off by one subtle bugs. It has done that to me. +But do you think future systems will? Or is it really the off by one is actually a fundamental +challenge of programming? In that case, it wasn't fundamental. And I think things can improve, but +yeah, I think humans have to supervise. I am nervous about people not supervising what comes out +and what happens to, for example, the proliferation of bugs in all of our systems. +I'm nervous about that, but I think there will probably be some other copilots for bug finding +and stuff like that at some point. Cause there'll be like a lot more automation for. +It's like a program, a copilot that generates a compiler, one that does a linter, one that does +like a type checker. It's a committee of like a GPT sort of like. And then there'll be like a manager +for the committee. And then there'll be somebody that says a new version of this is needed. We need +to regenerate it. Yeah. There were 10 GPTs. They were forwarded and gave 50 suggestions. Another +one looked at it and picked a few that they like. A bug one looked at it and it was like, it's +probably a bug. They got re-ranked by some other thing. And then a final ensemble GPT comes in. +It's like, okay, given everything you guys have told me, this is probably the next token. +The feeling is the number of programmers in the world has been growing and growing very quickly. +Do you think it's possible that it'll actually level out and drop to like a very low number +with this kind of world? Cause then you'll be doing software 2.0 programming. +And you'll be doing this kind of generation of copilot type systems programming, +but you won't be doing the old school software 1.0 program. +I don't currently think that they're just going to replace human programmers. +I'm so hesitant saying stuff like this, right? +This is going to be replaced in five years. I don't know. It's going to show that this is where +we thought. Cause I agree with you, but I think we might be very surprised. What's your sense of +where we stand with language models? Does it feel like the beginning or the middle or the end? +The beginning, a hundred percent. I think the big question in my mind is for sure GPT will be able +to program quite well, competently and so on. How do you steer the system? You still have to provide +some guidance to what you actually are looking for. And so how do you steer it? And how do you +talk to it? How do you audit it and verify that what is done is correct? And how do you work with +this? And it's as much not just an AI problem, but a UI UX problem. So beautiful fertile ground for +so much interesting work for VS code plus plus where it's not just human programming anymore. +It's amazing. Yeah. So you're interacting with the system. So not just one prompt, +but it's iterative prompting. You're trying to figure out having a conversation with the system. +Yeah. That actually, I mean, to me, that's super exciting to have a conversation with the program +I'm writing. Yeah. Maybe at some point you're just conversing with it. It's like, okay, here's what I +want to do. Actually this variable, maybe it's not even that low level as variable, but. You can also +imagine like, can you translate this to C plus plus and back to Python? Yeah, that already kind of +exists in some. No, but just like doing it as part of the program experience. Like I think I'd like +to write this function in C plus plus or like you just keep changing for different, uh, different +programs because of different six, six syntax. Maybe I want to convert this into a functional +language. And so like you get to become multilingual as a programmer and dance back and forth +efficiently. Yeah. I mean, I think the UI UX of it though is like still very hard to think through +because it's not just about writing code on a page. You have an entire developer environment. +You have a bunch of hardware on it. Uh, you have some environmental variables. You have some scripts +that are running in a Chrome job. Like there's a lot going on to like working with computers and how +do these systems set up environment flags and work across multiple machines and set up screen +sessions and automate different processes. Like how all that works and is auditable by humans and +so on is like massive question. No, my man. You've built archive sanity. What is archive +and what is the future of academic research publishing that you would like to see? +Uh, so archive is this pre print server. So if you have a paper, uh, you can submit it for +publication to journals or conferences and then wait six months and then maybe get a decision, +pass or fail, or you can just upload it to archive and then people can tweet about it three minutes +later and then everyone sees it, everyone reads it and everyone can profit from it, uh, in their own +way. So you can cite it and it has an official look to it. It feels like a pub, like it feels +like a publication process. It feels different than you if you just put it in a blog post. +Oh yeah. Yeah. I mean, it's a paper and usually the bar is higher for something that you would +expect on archive as opposed to something you would see in a blog post. Well, the culture +created the bar because you could probably post a pretty crappy picture on the archive. +Yes. Um, so what, what's that make you feel like? What's that make you feel about peer review? +So rigorous peer review by two, three experts versus the peer review of the community +right as it's written. Yeah. Basically I think the community is very well able to peer review +things very quickly on Twitter. And I think maybe it just has to do something with AI machine +learning fields specifically though. I feel like things are more easily auditable. Um, and the +verification is easier potentially than the verification somewhere else. So it's kind of +like, um, you can think of these, uh, scientific publications as like little blockchains where +everyone's building on each other's work and setting each other. And you sort of have AI, +which is kind of like this much faster and loose blockchain, but then you have any one individual +entry is like very, um, very cheap to make. And then you have other fields where maybe that +model doesn't make as much sense. Um, and so I think in AI, at least things are pretty easily +very viable. And so that's why when people upload papers, they're a really good idea and so on, +people can try it out like the next day and they can be the final arbiter of whether it works or +not on their problem. And the whole thing just moves significantly faster. So I kind of feel like +academia still has a place. Sorry, this like conference journal process still has a place, +but it's sort of like, um, it lags behind, I think. And it's a bit more, um, maybe higher quality +process. Uh, but it's not sort of the place where you will discover cutting edge work anymore. +Yeah. It used to be the case when I was starting my PhD, that you go to conferences and journals +and you discuss all the latest research. Now, when you go to a conference or general, like no +one discusses anything that's there because it's already like three generations ago irrelevant. +Yeah. Which makes me sad about like DeepMind, for example, where they, they still, they still +publish in nature and these big prestigious, I mean, there's still value, I suppose to the prestige +that comes with these big venues, but the result is that they, they'll announce some breakthrough +performance and it will take like a year to actually publish the details. I mean, +and those details in, if they were published immediately, it would inspire the community +to move in certain directions. Yeah, it would speed up the rest of the community, +but I don't know to what extent that's part of their objective function also. +That's true. So it's not just the prestige, a little bit of the delay is, uh, as part of. +Yeah, they certainly, uh, DeepMind specifically has been, um, working in the regime of having +a slightly higher quality, basically process and latency and, uh, publishing those papers that way. +Another question from Reddit. Do you, or have you suffered from imposter syndrome? Being the director +of AI Tesla, uh, being this person when you're at Stanford, where like the world looks at you +as the expert in AI to teach, teach the world about machine learning. When I was leaving +Tesla after five years, I spent a ton of time in meeting rooms. Uh, and you know, I would read +papers in the beginning when I joined Tesla, I was writing code and then I was writing less +and less code and I was reading code and then I was reading less and less code. And so this is just +a natural progression that happens, I think. And, uh, definitely I would say near the tail end. +That's when it sort of like starts to hit you a bit more that you're supposed to be an expert, +but actually the source of truth is the code that people are writing, the GitHub and the actual, +the actual code itself. Uh, and you're not as familiar with that as you used to be. +And so I would say maybe there's some like insecurity there. +Yeah, that's actually pretty profound that a lot of the insecurity has to do with not writing the +code in the computer science space like that, cause that is the truth that, that right there. +The code is the source of truth, the papers and everything else. It's a high level summary. +I don't, uh, yeah, just a high level summary, but at the end of the day, you have to read code. +It's impossible to translate all that code into actual, uh, you know, uh, paper form. Uh, so when, +when things come out, especially when they have a source code available, that's my favorite place +to go. So like I said, you're one of the greatest teachers of machine learning AI ever, uh, from CS +231N to today. What advice would you give to beginners interested in getting into machine +learning? Beginners are often focused on like what to do. And I think the focus should be more like +how much you do. So I am kind of like believer on a high level in this 10,000 hours kind of concept +where you just kind of have to just pick the things where you can spend time and you care about and +you're interested in. You literally have to put in 10,000 hours of work. Um, it doesn't even like +matter as much like where you put it and you're, you'll iterate and you'll improve and you'll +waste some time. I don't know if there's a better way you need to put in 10,000 hours, but I think +it's actually really nice because I feel like there's some sense of determinism about, uh, +being an expert at a thing. If you spend 10,000 hours, you can literally pick an arbitrary thing. +And I think if you spend 10,000 hours of deliberate effort and work, you actually will become an +expert at it. And so I think it's kind of like a nice thought. Um, and so, uh, basically I would +focus more on like, are you spending 10,000 hours? That's what I'm focused on. So, and then thinking +about what kind of mechanisms maximize your likelihood of getting to 10,000 hours, which +for us silly humans means probably forming a daily habit of like every single day, +actually doing the thing, whatever helps you. So I do think to a large extent is a psychological +problem for yourself. Uh, one other thing that I help that I think is helpful for the psychology +of it is many times people compare themselves to others in the area. I think this is very harmful +only compare yourself to you from some time ago, like say a year ago, are you better than you +year ago? This is the only way to think. Um, and I think this, then you can see your progress and +it's very motivating. That's so interesting that focus on the quantity of hours. Cause I think a +lot of people, uh, in the beginner stage, but actually throughout get paralyzed, uh, by, uh, +the choice, like which one do I pick this path or this path? Like they'll literally get paralyzed, +but like which ID to use. Well, they're worried. Yeah. They'll worried about all these things, +but the thing is some of the, you will waste time doing something wrong. You will eventually +figure out it's not right. You will accumulate scar tissue and next time you'll grow stronger +because next time you'll have the scar tissue and next time you'll learn from it. And now next time +come to a similar situation, you'll be like, Oh, I, I messed up. I've spent a lot of time working +on things that never materialized into anything. And I have all that scar tissue and I have some +intuitions about what was useful, what wasn't useful, how things turned out. Uh, so all those +mistakes were, uh, were not dead work, you know? So I just think you should, did you just focus on +working? What have you done? What have you done last week? Uh, that's a good question actually to +ask for, for a lot of things, not just machine learning. Um, it's a good way to cut the, +the, I forgot what the term we use, but the fluff, the blubber, whatever the, +uh, the inefficiencies in life. Uh, what do you love about teaching? You seem to find yourself +often in the, like draw onto teaching. You're very good at it, but you're also drawn to it. +I mean, I don't think I love teaching. I love happy humans and happy humans like when I teach. +I wouldn't say I hate teaching. I tolerate teaching, but it's not like the act of teaching +that I like. It's, it's that, um, you know, I, I have some, I have something I'm actually okay at +it. I'm okay at teaching and people appreciate it a lot. And, uh, so I'm just happy to try to be +helpful and, uh, teaching itself is not like the most, I mean, it's really annoying. It can be +really annoying, frustrating. I was working on a bunch of lectures just now. I was reminded back to +my days of 231 and just how much work it is to create some of these materials and make them good. +The amount of iteration and thought, and you go down blind alleys and just how much you change it. +So creating something good, um, in terms of like educational value is really hard and, uh, it's not +fun. It was difficult. So for people to definitely go watch your new stuff, you put out, there are +lectures where you're actually building the thing like from, like you said, the code is truth. So +discussing, uh, backpropagation by building it, by looking through it and just the whole thing. +So how difficult is that to prepare for? I think that's a really powerful way to teach. +Did you have to prepare for that or are you just live thinking through it? +I will typically do like say three takes and then I take like the better take. Uh, so I do multiple +takes and I take some of the better takes and then I just build out a lecture that way. Uh, +sometimes I have to delete 30 minutes of content because it just went down the alley that I didn't +like too much. There's a bunch of iteration and it probably takes me, you know, somewhere around +10 hours to create one hour of content. To get one hour. It's interesting. I mean, uh, +is it difficult to go back to the basics? Do you draw a lot of like wisdom from going back to the +basics? Yeah. Going back to backpropagation loss functions, where they come from. And one thing +I like about teaching a lot honestly is it definitely strengthens your understanding. +So it's not a purely altruistic activity. It's a way to learn. If you have to explain +something to someone, uh, you realize you have gaps in knowledge. Uh, and so I even +surprised myself in those lectures. Like, oh, the result will obviously look at this and then the +result doesn't look like it. And I'm like, okay, I thought I understood this. Yeah. +But that's why it's really cool. Literally code, you run it in the notebook and it gives you a +result and you're like, oh, wow. Yes. And like actual numbers, actual input, actual code. +Yeah. It's not mathematical symbols, et cetera. The source of truth is the code. It's not slides. +It's just like, let's build it. It's beautiful. You're a rare human in that sense. Uh, what +advice would you give to researchers, uh, trying to develop and publish idea that have a big impact +in the world of AI? So maybe, um, undergrads, maybe early graduate students. Yep. I mean, +I would say like, they definitely have to be a little bit more strategic than I had to be as a +PhD student because of the way AI is evolving. It's going the way of physics, where, you know, +in physics, you used to be able to do experiments on your bench top and everything was great and +you could make progress. And now you have to work in like LHC or like CERN. And, and so AI is going +in that direction as well. Um, so there's certain kinds of things that's just not possible to do on +the bench top anymore. And, uh, I think, um, that didn't used to be the case at the time. +Do you still think that there's like, GAN type papers to be written where like, uh, like very +simple idea that requires just one computer to illustrate a simple example? I mean, one example +that's been very influential recently is diffusion models. The fusion models are amazing. The fusion +models are six years old. Uh, for the longest time, people were kind of ignoring them as far +as I can tell. And, uh, they're an amazing generative model, especially in, uh, in images. +And so stable diffusion and so on. It's all diffusion based. Uh, the fusion is new. It was +not there and came from, well, it came from Google, but a researcher could have come up with it. In +fact, some of the first actually know those came from Google as well. Uh, but a researcher could +come up with that in an academic institution. Yeah. What do you find most fascinating about +diffusion models? So from the societal impact of the technical architecture, what I like about +the fusion is it works so well. Was that surprising to you? The amount of the variety, almost the +novelty of the synthetic data is generating. Yeah. So the stable diffusion images are incredible. +It's the speed of improvement in generating images has been insane. Uh, we went very quickly +from generating like tiny digits to tiny faces and it all looked messed up. And now we were stable +diffusion and that happened very quickly. There's a lot that academia can still contribute. Uh, +you know, for example, um, flash attention is a very efficient kernel for running the attention +operation inside the transformer that came from academic environment. It's a very clever way to +structure the kernel, uh, that do the best calculation. So it doesn't materialize the +attention matrix. Um, and so there's, I think there's still like lots of things to contribute, +but you have to be just more strategic. Do you think neural networks can be made to reason? +Uh, yes. Do you think they already reason? Yes. What's your definition of reasoning? Uh, +information processing. +So in the way that humans think through a problem and come up with novel ideas, +it, it feels like reasoning. Yeah. So the, the novelty, +I don't want to say, but out of, out of distribution ideas, you think it's possible? +Yes. And I think we're seeing that already in the current neural nets. You're able to remix the +training set information into true generalization in some sense. That doesn't appear in a fundamental +way in the training set. Like you're doing something interesting algorithmically, you're +manipulating, you know, some symbols and you're coming up with some correct, unique answer in a +new setting. What would, uh, illustrate to you, holy shit, this thing is definitely thinking. +To me, thinking or reasoning is just information processing and generalization. And I think the +neural nets already do that today. So being able to perceive the world or perceive the, +whatever the inputs are and to make predictions based on that or actions based on that, that's, +that's the reason. Yeah. You're giving correct answers in novel settings, uh, by manipulating +information. You've learned the correct algorithm. You're not doing just some kind of a lookup table +on the Earth's neighbor search. Something like that. Let me ask you about AGI. What, what are some +moonshot ideas you think might make significant progress towards AGI? Or maybe another way is +what are the big blockers that we're missing now? So basically I am fairly bullish on our ability to +build AGI's, uh, basically automated systems that we can interact with that are very human-like +and we can interact with them in the digital realm or physical realm. Currently, it seems +most of the models that sort of do these sort of magical tasks are in a text realm. Um, I think, +as I mentioned, I'm suspicious that the text realm is not enough to actually build full +understanding of the world. I do actually think you need to go into pixels and understand the +physical world and how it works. So I do think that we need to extend these models to consume +images and videos and train on a lot more data that is multimodal in that way. Do you think you +need to touch the world to understand it also? Well, that's the big open question I would say +in my mind is if you also require the embodiment and the ability to, uh, sort of, sort of interact +with the world, run experiments and, um, have a data of that form, then you need to go to optimist +or something like that. And so I would say optimist in some way is like a hedge, um, +in AGI because it seems to me that it's possible that just having data from the internet is not +enough. If that is the case, then optimist may lead to AGI, uh, because optimist would, I, to me, +there's nothing beyond optimist. You have like this humanoid form factor that can actually like +do stuff in the world. You can have millions of them interacting with humans and so on. And, uh, +if that doesn't give rise to AGI at some point, like I'm not sure what will. Um, so from a +completeness perspective, I think that's the, uh, that's a really good platform, but it's a much +harder platform because, uh, you are dealing with atoms and you need to actually like build these +things and integrate them into society. So I think that path takes longer, uh, but it's much +more certain. And then there's a path of the internet and just like training these compression +models effectively, uh, on a trend compress all the internet. And, uh, that might also give, um, +these agents as well. Compress the internet, but also interact with the internet. So it's not +obvious to me. In fact, I suspect you can reach AGI without ever entering the physical world. +And the, which is a little bit more, uh, concerning because it might, that results in it happening +faster. So it just feels like we're in, like in boiling water. We won't know as it's happening. +I would like to, I'm not afraid of AGI. I'm excited about it. There's always concerns, +but I would like to know when it happens. Yeah. Or it have like hints about when it happens, like +a year from now, it will happen. That kind of thing. I just feel like in the digital realm, +it just might happen. Yeah. I think all we have available to us because no one has built AGI +again. So all we have available to us is, uh, is there enough fertile ground on the periphery? +I would say yes. And we have the progress so far, which has been very rapid and, uh, there are next +steps that are available. And so I would say, uh, yeah, it's quite likely that we'll be interacting +with digital entities. How will you know that somebody has built AGI? It's going to be a slow, +I think it's going to be a slow incremental transition is going to be product based and +focused. It's going to be GitHub co-pilot getting better. And then, uh, GPT is helping you right. +And then these oracles that you can go to with mathematical problems, I think we're on a, +on a verge of being able to ask very complex questions in chemistry, physics, math, +of these oracles and have them complete solutions. So AGI to use primarily focused on intelligence. +So consciousness doesn't enter into, uh, into it. So in my mind, consciousness is not a special +thing you will, you will figure out and bolt on. I think it's an emerging phenomenon of a +large enough and complex enough, um, generative model sort of. So, um, if you have a complex +enough world model, uh, that understands the world, then it also understands its predicament +in the world as being a language model, which to me is a form of consciousness or self-awareness. +And so in order to understand the world deeply, you probably have to integrate yourself into the +world. And in order to interact with humans and other living beings, consciousness is a very +useful tool. I think consciousness is like a modeling insight. Modeling insight. Yeah. It's a, +you have a powerful enough model of understanding the world that you actually understand that you +are an entity in it. Yeah. But there's also this, um, perhaps just the narrative we tell ourselves. +There's a, it feels like something to experience the world, the hard problem of consciousness, +but that could be just a narrative that we tell ourselves. Yeah. I don't think we'll, +yeah, I think it will emerge. I think it's going to be something very boring. Like we'll be talking +to these digital AIs, they will claim they're conscious. They will appear conscious. They will +do all the things that you would expect of other humans. And, uh, it's going to just be a stalemate. +I think there'll be a lot of actual fascinating ethical questions, like Supreme Court level +questions of whether you're allowed to turn off a conscious AI. If you're allowed to build a +conscious AI, maybe there would have to be the same kind of debate that you have around +um, sorry to bring up a political topic, but you know, abortion, uh, which is the deeper question +with abortion, uh, is what is life? And the deep question with AI is also what is life and what is +conscious? And I think that'll be very fascinating to bring up. It might become illegal to build +systems that are capable like of such level of intelligence that consciousness would emerge. +And therefore the capacity to suffer would emerge and somebody, a system that says, no, +please don't kill me. Well, that's what the Lambda compute, the Lambda chatbot already told, +um, this Google engineer, right? Like it was talking about not wanting to die or so on. +So that might become illegal to do that. Right. +Cause otherwise you might have a lot of, a lot of creatures that don't want to die +and they will, uh, you can just spawn infinity of them on a cluster. +And then that might lead to like horrible consequences. Cause then there might be a lot +of people that secretly love murder and then we'll start practicing murder on those systems. +I mean, there's just, I, to me, all of this stuff just brings a beautiful mirror to the human +condition and human nature. We'll get to explore it. And that's what like the best of, uh, the +Supreme court of all the different debates we have about ideas of what it means to be human. +We get to ask those deep questions that we've been asking throughout human history. +There's always been the other in human history. Uh, we're the good guys and that's the bad guys. +And we're going to, uh, you know, throughout human history, let's murder the bad guys. +And the same will probably happen with robots. It'll be the other at first. And then we'll get +to ask questions of what does it mean to be alive? What does it mean to be conscious? +Yep. And I think there's some canary in the coal mines, even with what we have today. +And, uh, you know, for example, these, there's these like waifus that you can like work with. +And some people are trying to like, this company is going to shut down, but this person really like, +love their waifu and like, it's trying to like port it somewhere else. And like, it's not possible. +And like, I think like definitely, uh, people will have feelings towards, uh, towards these, +um, systems because in some sense they are like a mirror of humanity because they are like sort of +like a big average of humanity in a way that it's trained. But we can, that average, +we can actually watch. There's, it's nice to be able to interact with the big average of humanity +and do like a search query on it. Yeah. Yeah. It's very fascinating. And, uh, we can of course, +also like shape it. It's not just a pure average. We can mess with the training data. We can mess +with the objective. We can fine tune them in various ways. Uh, so we have some, um, you know, +impact on what those systems look like. If you want to achieve AGI, um, and you could, uh, have +a conversation with her and ask her, uh, talk about anything, maybe ask her a question. What, +what kind of stuff would you, would you ask? I would have some practical questions in my mind, +like, uh, do I or my loved ones really have to die? Uh, what can we do about that? +Do you think it will answer clearly or would it answer poetically? +I would expect it to give solutions. I would expect it to be like, well, I've read all of +these textbooks and I know all these things that you've produced. And it seems to me like, +here are the experiments that I think it would be useful to run next. And here's some gene +therapies that I think would be helpful. And, uh, here are the kinds of experiments that you should +run. Okay. Let's go with this thought experiment. Okay. Imagine that mortality is actually, uh, +pre like a prerequisite for happiness. So if we become immortal, we'll actually become deeply +unhappy and the model is able to know that. So what is this supposed to tell you? Stupid human +about it. Yes, you can become a mortal, but you will become deeply unhappy. If, if the models, +if the AGI system is trying to empathize with you human, what is this supposed to tell you that? +Yes, you don't have to die, but you're really not going to like it. Is that, is it going to be +deeply honest? Like there's a interstellar. What is it? The AI says like humans want 90% honesty. +Yeah. So like you have to pick how honest do I want to answer these practical questions? +Yeah. I love AI and interstellar by the way. I think it's like such a sidekick to the entire story, +but at the same time, it's like really interesting. It's kind of limited in certain ways, +right? Yeah, it's limited. And I think that's totally fine by the way. I don't think, uh, +I think it's fine and plausible to have a limited and imperfect AGI. +Is that the feature almost as an example, like it has a fixed amount of compute on its physical +body. And it might just be that even though you can have a super amazing mega brain, +super intelligent AI, you also can have like, you know, less intelligent as they can deploy +in a power efficient way. And then they're not perfect. They might make mistakes. +No, I meant more like say you had infinite compute and it's still good to make mistakes sometimes +to integrate yourself. Like, um, what is it going back to goodwill hunting? Uh, +Robin Williams character says like the human imperfections, that's the good stuff, right? +Isn't it, isn't that the S like we don't want perfect. We want flaws in part to, +to form connections with each other because it feels like something you can attach your feelings +to the, the, the flaws in that same way. You want AI that's flawed. I don't know. I feel like +perfectionist, but then you're saying, okay, yeah, but that's not AGI, but see AGI would need to be +intelligent enough to give answers to humans that humans don't understand. And I think perfect isn't +something humans can't understand because even science doesn't give perfect answers. There's +always gabs and mysteries and I don't know. I, I don't know if humans want perfect. +Yeah. I could imagine just, um, having a conversation with this kind of oracle entity +as you'd imagine them. And, uh, yeah, maybe it can tell you about, you know, based on my analysis of +human condition, um, you might not want this and here are some of the things that might, +but every, every dumb human will say, yeah, yeah, yeah, yeah. Trust me. I can give me the truth. I +can handle it, but that's the beauty. Like people can choose. Uh, so, but then +it's the old marshmallow test with the kids and so on. I feel like too many people, +like can't handle the truth, probably including myself, like the deep truth of the human +condition. I don't, I don't know if I can handle it. Like, what if there's some dark stuff? What, +what if we are an alien science experiment and it realizes that what if it had, I mean, +I mean, this is the matrix, you know, all over again. +I don't know. I would, what would I talk about? I don't even, yeah, I, uh, probably I will go +with the safer scientific questions at first that have nothing to do with my own personal life and +mortality, just like about physics and so on, uh, to, to build up, like, let's see where it's at, +or maybe see if it has a sense of humor. That's another question. Would it be able to, uh, +presumably in order to, if it understands humans deeply, it would be able to generate, uh, +yeah, to generate humor. Yeah. I think that's actually a wonderful benchmark almost. Like, +is it able, I think that's a really good point basically to make you laugh. Yeah. If it's able +to be like a very effective standup comedian, that is doing something very interesting computationally. +I think being funny is extremely hard. Yeah. Because it's hard in a way, like a touring test, +the original intent of the touring test is hard because you have to convince humans and there's +nothing that's why, that's why comedians talk about this. Like there's, this is deeply honest +because if people can't help but laugh and if they don't laugh, that means you're not funny. +If they laugh, it's funny. And you're showing, you need a lot of knowledge to create, to create +humor about like the documentation, human condition and so on. And then you need to be clever with it. +Uh, you mentioned a few movies you tweeted movies that I've seen five plus times, but +I'm ready and willing to keep watching interstellar gladiator contact goodwill hunting, +the matrix, Lord of the rings, all three avatar fifth elements. So on and goes on terminated to +mean girls. I'm not going to ask about that. I think her man girls is great. Um, what are some +that jump out to your memory that you love and why you mentioned the matrix +as a computer person, why do you love the matrix? There's so many properties that make it like +beautiful and interesting. So, uh, there's all these philosophical questions, but then there was +also a GIs and there's a simulation and it's cool. And there's, you know, the black, uh, you know, +the look of it, the feel of it, the feel of it, the action, the bullet time. It was just like +innovating in so many ways. And then, uh, goodwill goodwill hunting. Why do you like that one? +Yeah, I just, I really like this, uh, tortured genius sort of character who's like grappling +with whether or not he has like any responsibility or like what to do with this gift that he was +given or like how to think about the whole thing. And, uh, there's also a dance between the genius +and the, the personal, like what it means to love another human being. And there's a lot of things +there. It's just a beautiful movie. And then the fatherly figure, the mentor in the, in the +psychiatrist and the, it like really like, uh, it messes with you. You know, there's some movies +that just like really mess with you, uh, on a deep level. Do you relate to that movie at all? +No, it's not your fault. As I said, Lord of the Rings, that's self-explanatory. Terminator two, +which is interesting. You rewatch that a lot. Is that better than Terminator one? You like, +I do like Terminator one as well. Uh, I like Terminator two a little bit more, +but in terms of like its surface properties, +do you think Skynet is at all a possibility? Uh, yes. +Like the actual sort of, uh, autonomous, uh, weapon system kind of thing. Do you worry about that +stuff? I do worry. I being useful war. I a hundred percent worry about it. And so the, +I mean, the, uh, you know, some of these, uh, fears of AGI and how this will plan out, I mean, +these will be like very powerful entities probably at some point. And so, um, for a long time, +there are going to be tools in the hands of humans. Uh, you know, people talk about like +alignment of AGI and how to make the problem is like even humans are not aligned. Uh, so, +uh, how this will be used and what this is going to look like is, um, yeah, it's troubling. So. +Do you think it'll happen slowly enough that we'll be able to, +as a human civilization, think through the problems? +Yes. That's my hope is that it happens slowly enough and in open enough way where a lot of +people can see and participate in it. Just figure out how to deal with this transition. I think +we're just going to be interesting. I draw a lot of inspiration from nuclear weapons +because I sure thought it would be, it would be fucked once they develop nuclear weapons. +But like, it's almost like, uh, when, uh, when the systems are not so dangerous, +they distort human civilization. We deploy them and learn the lessons. And then we quickly +if it's too dangerous, we'll quickly, quickly, we might still deploy it. Uh, but you very quickly +learn not to use them. And so there'll be like this balance achieved. Humans are very clever as +a species. It's interesting. We exploit the resources as much as we can, but we don't, +we avoid destroying ourselves. It seems like. Well, I don't know about that actually. I hope +it continues. Um, I mean, I'm definitely like concerned about nuclear weapons and so on, +not just as a result of the recent conflict, even before that, uh, that's probably my number +one concern for humanity. So if humanity, uh, destroys itself or destroys, you know, 90% +of people that would be because of nukes. I think so. Um, and it's not even about the full +destruction to me. It's bad enough if we reset society, that would be like terrible. It would +be really bad. And I can't believe we're like so close to it. Yeah. It's like so crazy to me. +It feels like we might be a few tweets away from something like that. Yep. Basically it's extremely +unnerving, but it has been for me for a long time. It seems unstable that world leaders, +just having a bad mood can like, um, take one step towards a bad direction and it escalates. +Yeah. And because of a collection of bad moods, it can escalate without being able to, um, stop. +Yeah, it's just, uh, it's a huge amount of, uh, power. And then also with the proliferation, +I basically, I don't, I don't actually really see, I don't actually know what the good outcomes are +here. Uh, so I'm definitely worried about that a lot. And then AGI is not currently there, +but I think at some point will more and more become something like it. The danger with AGI +even is that I think it's even like slightly worse in the sense that, uh, there are good outcomes of +AGI and then the bad outcomes are like an Epsilon away, like a tiny one away. And so I think, um, +capitalism and humanity and so on will drive for the positive, uh, ways of using that technology. +But then if bad outcomes are just like a tiny, like flip a minus sign away, uh, that's a really +bad position to be in a tiny perturbation of the system results in the destruction of the human +species. So we are lying to walk. Yeah. I think in general, it's really weird about like the +dynamics of humanity and this explosion we've talked about is just like the insane coupling +afforded by technology and, uh, just the instability of the whole dynamical system. +I think it's just, it doesn't look good, honestly. Yes. That explosion could be destructive and +constructive and the probabilities are non-zero in both. Yeah. I mean, I have to, I do feel like I +have to try to be optimistic and so on. And I think even in this case, I still am predominantly +optimistic, but there's definitely. Me too. Uh, do you think we'll become a multi-planetary species? +Probably yes, but I don't know if it's dominant feature of, uh, future humanity. Uh, there might +be some people on some planets and so on, but I'm not sure if it's like, yeah, if it's like a major +player in our culture and so on, we still have to solve the drivers of self-destruction here on earth. +So just having a backup on Mars is not going to solve the problem. So by the way, I love the backup +on Mars. I think that's amazing. You should absolutely do that. Yes. And I'm so thankful. +Would you go to Mars? Uh, personally, no, I do like earth quite a lot. Okay. Uh, I'll go to Mars. +I'll go for you. I'll tweet at you from there. Maybe eventually I would once it's safe enough, +but I don't actually know if it's on my lifetime scale unless I can extend it by a lot. +I do think that, for example, a lot of people might disappear into, um, virtual realities and +stuff like that. And I think that could be the major thrust of, um, sort of the cultural +development of humanity if it survives. Uh, so it might not be, it's just really hard to work in +physical realm and go out there. And I think ultimately all your experiences are in your +brain. And so it's much easier to disappear into digital realm. And I think people will +find them more compelling, easier, safer, more interesting. So you're a little bit captivated +by virtual reality, by the possible worlds, whether it's the metaverse or some other +manifestation of that. Yeah. Yeah. It's really interesting. It's, uh, I'm, I'm interested just, +just talking a lot to Carmack. Where's the, where's the thing that's currently preventing that? +Yeah. I mean, to be clear, I think what's interesting about future is, um, it's not that +I kind of feel like the variance in the human condition grows. That's the primary thing that's +changing. It's not as much the mean of the distribution is like the variance of it. So +there will probably be people on Mars and there will be people in VR and there will people here +on earth. It's just like, there will be so many more ways of being. And so I kind of feel like +I see it as like a spreading out of a human experience. There's something about the internet +that allows you to discover those little groups and then you gravitate to something about your +biology, likes that kind of world that you find each other. Yeah. And we'll have transhumanists +and then we'll have the Amish and they're going to, everything is just going to coexist. +You know, the cool thing about it, cause I've interacted with a bunch of internet communities +is, um, they don't know about each other. Like you can have a very happy existence, +just like having a very close knit community and not knowing about each other. I mean, even, +you even sense this, just having traveled to Ukraine, there's, they, they don't know +so many things about America. You, you like when you travel across the world, +I think you experienced this too. There are certain cultures that are like, +they have their own thing going on. They don't. And so you can see that happening more and more +and more and more in the future. We have little communities. Yeah. Yeah. I think so. That seems +to be the, that seems to be how it's going right now. And I don't see that trend like really +reversing. I think people are diverse and they're able to choose their own path and existence. +And I sort of like celebrate that. Um, and so- Will you spend some much time in the metaverse, +in the virtual reality or which community area are you the physicalist, uh, the, the, +the physical reality enjoyer or, uh, do you see drawing a lot of, uh, pleasure and fulfillment +in the digital world? Yeah, I think, well, currently the virtual reality is not that compelling. +I do think it can improve a lot, but I don't really know to what extent maybe, you know, +there's actually like even more exotic things you can think about with like neural links or +stuff like that. So, um, currently I kind of see myself as mostly a team human person. I love +nature. I love harmony. I love people. I love humanity. I love emotions of humanity. Um, and +I, I just want to be like in this like solar punk little utopia. That's my happy place. Yes. My happy +place is like, uh, people I love thinking about cool problems surrounded by a lush, beautiful, +dynamic nature and a secretly high tech in places that count places. They use technology to empower +that love for other humans and nature. Yeah. I think a technology used like very sparingly. +I don't love when it sort of gets in the way of humanity in many ways. Uh, I like just people +being humans in a way we sort of like slightly evolved and prefer, I think just by default. +People kept asking me because they, they know you love reading. Are there particular books +that you enjoyed that had an impact on you for silly or for profound reasons that you would +recommend? You mentioned the vital question. Many, of course, I think in biology as an example, +the vital question is a good one. Anything by Nic Lane, really, uh, life ascending, I would say +is like a bit more potentially, uh, representative as like a summary of a lot of the things he's been +about. I was very impacted by the selfish gene. I thought that was a really good book that helped +me understand altruism as an example and where it comes from. And just realizing that, you know, +the selection is on the level of genes was a huge insight for me at the time. And it sort of like +cleared up a lot of things for me. What do you think about the, the idea that ideas are the +organisms, the meat? Yes, love it. A hundred percent. Are you able to walk around with that +notion for a while that, that there is an evolutionary kind of process with ideas as well? +There absolutely is. There's memes just like genes and they compete and they live in our brains. +It's beautiful. Are we silly humans thinking that we're the organisms? Is it possible that the +primary organisms are the ideas? Yeah, I would say like the, the ideas kind of live in the software +of like our civilization in the, in the minds and so on. We think as humans that the hardware is +the fundamental thing. I human is a hardware entity, but it could be the software, right? +Yeah. Yeah. I would say like there needs to be some grounding at some point to like a physical +reality. Yeah. But if we clone an Andre, the software is the thing, like is this thing that +makes that thing special, right? Yeah, I guess you're right. But then cloning might be exceptionally +difficult. Like there might be a deep integration between the software and the hardware in ways we +don't quite understand. Well, from the ultimate point of view, like what makes me special is more +like the, the gang of genes that are writing in my chromosomes, I suppose, right? Like they're the, +they're the replicating unit, I suppose. And no, but that's just the thing that makes you special. +Sure. Well, the reality is what makes you special is your ability to survive +based on the software that runs on the hardware that was built by the genes. +So the software is the thing that makes you survive, not the hardware. +All right. It's a little bit of both. I mean, you know, it's just like a second layer. It's +a new second layer that hasn't been there before the brain. They both, they both coexist. +But there's also layers of the software. I mean, it's, it's not, it's a, it's a abstraction on top +of abstractions. But, okay. So Selfish Gene and Nick Lane, I would say sometimes books are like +not sufficient. I like to reach for textbooks sometimes. I kind of feel like books are for +too much of a general consumption sometime. And they just kind of like, they're too high up in +the level of abstraction and it's not good enough. So I like textbooks. I like The Cell. I think +The Cell was pretty cool. That's why also I like the writing of Nick Lane is because he's pretty +willing to step one level down and he doesn't, yeah, he sort of, he's willing to go there. +But he's also willing to sort of be throughout the stack. So he'll go down to a lot of detail, +but then he will come back up. And I think he has a, yeah, basically I really appreciate that. +That's why I love college, early college, even high school, just textbooks on the basics. +Of computer science, of mathematics, of biology, of chemistry. Those are, they condense down like +it's sufficiently general that you can understand both the philosophy and the details, but also like +you get homework problems and you get to play with it as much as you would if you were in +programming stuff. Yeah. And then I'm also suspicious of textbooks, honestly, because +as an example in deep learning, there's no like amazing textbooks and I feel this changing very +quickly. I imagine the same is true in say synthetic biology and so on. These books like The Cell are +kind of outdated. They're still high level. Like what is the actual real source of truth? It's +people in wet labs working with cells, sequencing genomes and yeah, actually working with it. And +I don't have that much exposure to that or what that looks like. So I still don't fully, +I'm reading through the cell and it's kind of interesting and I'm learning, but it's still not +sufficient I would say in terms of understanding. Well, it's a clean summarization of the mainstream +narrative, but you have to learn that before you break out towards the cutting edge. Yeah. But what +is the actual process of working with these cells and growing them and incubating them? And it's +kind of like a massive cooking recipes of making sure your cells lives and proliferate and then +you're sequencing them, running experiments and just how that works, I think is kind of like the +source of truth of at the end of the day, what's really useful in terms of creating therapies and +so on. Yeah. I wonder what in the future AI textbooks will be because there's artificial +intelligence, the modern approach. I actually haven't read if it's come out the recent version, +there's been a recent addition. I also saw there's a science, a deep learning book. I'm waiting for +textbooks that are worth recommending, worth reading. It's tricky because it's like papers +and code, code, code. Honestly, I find papers are quite good. I especially like the appendix of any +paper as well. It's like the most detail you can have. It doesn't have to be cohesive connected +to anything else. You just described me a very specific way you saw the particular thing. Yeah. +Many times papers can be actually quite readable, not always, but sometimes the introduction and +the abstract is readable even for someone outside of the field. This is not always true. Sometimes +I think, unfortunately, scientists use complex terms even when it's not necessary. I think that's +harmful. I think there's no reason for that. Papers sometimes are longer than they need to be in the +parts that don't matter. Appendix should be long, but then the paper itself, look at Einstein, +make it simple. Yeah, but certainly I've come across papers I would say in synthetic biology +or something that I thought were quite readable for the abstract and the introduction. Then you're +reading the rest of it and you don't fully understand, but you are getting a gist and I +think it's cool. What advice, you give advice to folks interested in machine learning and research, +but in general, life advice to a young person in high school, early college about how to have a +career they can be proud of or a life they can be proud of? Yeah, I think I'm very hesitant to give +general advice. I think it's really hard. I've mentioned some of the stuff I've mentioned is +fairly general, I think. Focus on just the amount of work you're spending on a thing. +Compare yourself only to yourself, not to others. That's good. I think those are fairly general. +How do you pick the thing? You just have a deep interest in something or try to find the argmax +over the things that you're interested in. Argmax at that moment and stick with it. How do you not +get distracted and switch to another thing? You can, if you like. +If you do an argmax repeatedly every week, every month, it's a problem. +Yeah, you can low pass filter yourself in terms of what has consistently been true for you. +I definitely see how it can be hard, but I would say you're going to work the hardest on the thing +that you care about the most. Low pass filter yourself and really introspect in your past, +what are the things that gave you energy and what are the things that took energy away from you? +Concrete examples. Usually from those concrete examples, sometimes patterns can emerge. +I like it when things look like this when I'm in these positions. +That's not necessarily the field, but the kind of stuff you're doing in a particular field. For you, +it seems like you were energized by implementing stuff, building actual things. +Yeah, being low level, learning, and then also communicating so that others can go through the +same realizations and shortening that gap. Because I usually have to do way too much work +to understand a thing. Then I'm like, okay, this is actually like, okay, I think I get it. +Why was it so much work? It should have been much less work. That gives me a lot of frustration, +and that's why I sometimes go teach. Aside from the teaching you're doing now, +putting out videos, aside from a potential Godfather Part II with the AGI at Tesla and beyond, +what does the future of Ranjha Kapothi hold? Have you figured that out yet or no? +As you see through the fog of war, that is all of our future. Do you start seeing silhouettes of +what that possible future could look like? The consistent thing I've been always interested +in for me at least is AI. That's probably what I'm spending the rest of my life on, +because I just care about it a lot. I actually care about many other problems as well, like say +aging, which I basically view as disease. I care about that as well, but I don't think it's a good +idea to go after it specifically. I don't actually think that humans will be able to come up with the +answer. I think the correct thing to do is to ignore those problems and you solve AI and then +use that to solve everything else. I think there's a chance that this will work. I think it's a very +high chance. That's the way I'm betting at least. When you think about AI, are you interested in +all kinds of applications, all kinds of domains, and any domain you focus on will allow you to get +insights to the big problem of AGI? Yeah, for me, it's the ultimate meta problem. I don't want to +work on any one specific problem. There's too many problems. How can you work on all problems +simultaneously? You solve the meta problem, which to me is just intelligence, and how do you +automate it? Is there cool small projects like Archives Sanity and so on that you're thinking +about that the world, the ML world can anticipate? There's always some fun side projects. +Archives Sanity is one. Basically, there's way too many archive papers. How can I organize it +and recommend papers and so on? I transcribed all of your podcasts. What did you learn from that +experience from transcribing the process of, you like consuming audiobooks and podcasts and so on. +Here's a process that achieves closer to human level performance and annotation. +Yeah. Well, I definitely was surprised that transcription with OpenAI's Whisper was +working so well compared to what I'm familiar with from Siri and a few other systems, I guess. +It works so well. That's what gave me some energy to try it out. I thought it could be fun to run +on podcasts. It's not obvious to me why Whisper is so much better compared to anything else, +because I feel like there should be a lot of incentive for a lot of companies to produce +transcription systems and that they've done so over a long time. Whisper is not a super exotic +model. It's a transformer. It takes smell spectrograms and just outputs tokens of text. It's +not crazy. The model and everything has been around for a long time. I'm not actually 100% +sure why this game model. Yeah, it's not obvious to me either. It makes me feel like I'm missing +something. I'm missing something. Yeah, because there is a huge, even Google and so on YouTube +transcription. Yeah. Yeah, it's unclear, but some of it is also integrating into a bigger system. +That is the user interface, how it's deployed and all that kind of stuff. Maybe +running it as an independent thing is much easier, like an order of magnitude easier than deploying +into a large integrated system like YouTube transcription or anything like meetings. Zoom +has transcription that's kind of crappy, but creating an interface where it detects the +different individual speakers, it's able to display it in compelling ways, run it in real time, +all that kind of stuff. Maybe that's difficult. That's the only explanation I have because +I'm currently paying quite a bit for human transcription and human captions annotation. +It seems like there's a huge incentive to automate that. Yeah. It's very confusing. +I think, I mean, I don't know if you looked at some of the whisper transcripts, but they're +quite good. They're good. Especially in tricky cases. I've seen +Whisper's performance on super tricky cases and it does incredibly well. I don't know. A podcast +is pretty simple. It's like high quality audio and you're speaking usually pretty clearly. +So I don't know. I don't know what OpenAI's plans are either. +Yeah. There's always like fun projects basically. StableDiffusion also is opening up a huge amount +of experimentation, I would say in the visual realm and generating images and videos and movies. +Videos now. That's going to be pretty crazy. That's going to almost certainly work and is +going to be really interesting when the cost of content creation is going to fall to zero. +You used to need a painter for a few months to paint a thing and now it's going to be speak to +your phone to get your video. Hollywood will start using that to generate scenes, +which completely opens up. Yeah. So you can make a movie like Avatar eventually for under a million +dollars. Much less. Maybe just by talking to your phone. I mean, I know it sounds kind of crazy. +And then there'd be some voting mechanism. Would there be a show on Netflix as +generated completely automatically? Yeah, potentially. Yeah. And what does it look +like also when you can just generate it on demand and there's infinity of it? +Yeah. Oh man. All the synthetic art. I mean, it's humbling because we treat ourselves as special +for being able to generate art and ideas and all that kind of stuff. If that can be done in an +automated way by AI. Yeah. I think it's fascinating to me how these, the predictions of AI and what +is going to look like and what it's going to be capable of are completely inverted and wrong. +And sci-fi of 50s and 60s was just like totally not right. They imagined AI as like super +calculating theory approvers and we're getting things that can talk to you about emotions. +They can do art. It's just like weird. Are you excited about that future? Just +AI's like hybrid systems, heterogeneous systems of humans and AI's talking about emotions, +Netflix and children, AI system where the Netflix thing you watch is also generated by AI. +I think it's going to be interesting for sure. And I think I'm cautiously optimistic, but it's +not obvious. Well, the sad thing is your brain and mine developed in a time where before Twitter, +before the internet. So I wonder people that are born inside of it might have a different +experience. Like I, maybe you can, will still resist it. And the people born now will not. +Well, I do feel like humans are extremely malleable. Yeah. And you're probably right. +What is the meaning of life, Andre? We talked about sort of the universe having a conversation +with us humans or with the systems we create to try to answer for the universe, +for the creator of the universe to notice us. We're trying to create systems that are loud enough +to answer back. I don't know if that's the meaning of life. That's like meaning of life for some +people. The first level answer I would say is anyone can choose their own meaning of life +because we are a conscious entity and it's beautiful. Number one. But I do think that +like a deeper meaning of life as someone is interested is along the lines of like, +what the hell is all this and like, why? And if you look at the inter fundamental physics +and the quantum field theory and the standard model, they're like very complicated. And +there's this like 19 free parameters of our universe and like, what's going on with all +this stuff and why is it here? And can I hack it? Can I work with it? Is there a message for me? +Am I supposed to create a message? And so I think there's some fundamental answers there +but I think there's actually even like, you can't actually like really make dent in those +without more time. And so to me also there's a big question around just getting more time honestly. +Yeah, that's kind of like what I think about quite a bit as well. +So kind of the ultimate, or at least first way to sneak up to the why question is to try to escape +the system, the universe. And then for that, you sort of backtrack and say, okay, for that, +that's going to be take a very long time. So the why question boils down from an engineering +perspective to how do we extend? Yeah, I think that's the question number one, practically +speaking, because you can't, you're not going to calculate the answer to the deeper questions +in time you have. And that could be extending your own lifetime or extending just the lifetime of +human civilization of whoever wants to not many people might not want that. But I think people +who do want that, I think, I think it's probably possible. And I don't think I don't know that +people fully realize this, I kind of feel like people think of death as an inevitability. But +at the end of the day, this is a physical system, some things go wrong. It makes sense why +things like this happen, evolutionary speaking. And there's most certainly interventions that +mitigate it. That'd be interesting if death is eventually looked at as, as a fascinating thing +that used to happen to humans. I don't think it's unlikely. I think it's, I think it's likely. +And it's up to our imagination to try to predict what the world without death looks like. +Yeah, it's hard to, I think the values will completely change. +Could be. I don't, I don't really buy all these ideas that, oh, without death, there's no meaning, +there's nothing as I don't intuitively buy all those arguments. I think there's plenty of meaning, +plenty of things to learn. They're interesting, exciting, I want to know, I want to calculate, +I want to improve the condition of all the humans and organisms that are alive. +Yeah, the way we find meaning might change. We, there is a lot of humans, probably including +myself, that finds meaning in the finiteness of things. But that doesn't mean that's the +only source of meaning. Yeah. I do think many people will, will go with that, which I think +is great. I love the idea that people can just choose their own adventure. Like you, you are +born as a conscious free entity by default, I'd like to think. And you have your unalienable +rights for life. In the pursuit of happiness. I don't know if you have that in the nature, +the landscape of happiness. You can choose your own adventure mostly. And that's not, +it's not fully true, but I still am pretty sure I'm an NPC, but an NPC can't know it's an NPC. +Hmm. There could be different degrees and levels of consciousness. I don't think there's a more +beautiful way to end it. Andre, you're an incredible person. I'm really honored you +would talk with me. Everything you've done for the machine learning world, for the AI world, +to just inspire people, to educate millions of people has been, it's been great. And I can't +wait to see what you do next. It's been an honor, man. Thank you so much for talking today. +Awesome. Thank you. Thanks for listening to this conversation +with Andre Karpathy. To support this podcast, please check out our sponsors in the description. +And now let me leave you with some words from Samuel Carlin. The purpose of models is not to +fit the data, but to sharpen the questions. Thanks for listening and hope to see you next time.