diff --git "a/data/yt_podcast_transcript.csv" "b/data/yt_podcast_transcript.csv" new file mode 100644--- /dev/null +++ "b/data/yt_podcast_transcript.csv" @@ -0,0 +1,6 @@ +title,url,length,publish_date,transcript,total_words +Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance,https://www.youtube.com/watch?v=v3O20NMdOuA,4584,2023-02-02,"Sarah: I think people see the output of models like DALL·E, GPT-3, et cetera, and they're amazed by what AI can do. And so the conversation doesn't even hinge on, ""We have access to this data set,"" or ""We have access to this talent pool."" It's more, ""AI is magical. What can we do with it? Come in and talk to us about this."" And again, I think that that is somewhat dangerous. Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Sarah Catanzaro was a practicing data scientist and then went into venture. She's currently a General Partner at Amplify Partners, and one of the leading investors in AI and ML. Her investments include a whole bunch of companies I admire, like RunwayML, OctoML, Gantry, and others. It's really interesting to talk to an investor who's also technical. She has insights both on how the technology is built and how it's being adopted by the market at large. This is a really fun conversation and I hope you enjoy it. Sarah, thanks so much for doing this. I've been looking forward to this one. I had a bunch of questions prepped and then I was looking at your Twitter and I was like, ""Oh, there's like a whole bunch of stuff that we should..."" Sarah: Yeah. I feel like I've been doing a lot of thinking out loud recently. Including in response to a lot of the hype around Stable Diffusion, LLMs, et cetera. I appreciate the fact that both of us were there in the 2013, 2014 phase where every company was claiming to be an AI company. It feels like we're kind of heading down that road again, which scares me a little bit. I hope at least there are enough companies — people — who remember the lessons of the last AI renaissance. But we'll see. Lukas: Well, let's get right into it then, because from my perspective, I totally remember at least one other AI bubble. Maybe more, depending on how you count it. I guess from where I sit, it feels like this one might be different in the sense that I feel like these challenges that were always...seemed super, super hard, seem like they're really working. And I feel like I see applications happening unbelievably fast after the paper comes out. Actually even maybe before there's time to even publish any paper on the topic. I think I might be more bullish about large language models and Stable Diffusion than you, which is great because we can actually have an interesting conversation here. But I thought it's interesting. You've invested in Runway, and just the other day Cris was showing me a natural language input into Runway where you could basically type what you want, and it would sort of set up the video editing to work that way. I thought, ""Oh my gosh,"" this might be a totally new kind of interface that lots of software might quickly adopt, I guess. But it sounds like — looking at your Twitter — it sounds like you were playing with large language models and finding it super frustrating and broken. Tell me about that. Sarah: Yeah, so I think my concern is less about the capabilities of large language models specifically, and more about some of the lessons that we learned during the last AI renaissance. Which I think was roughly like 2014 to maybe 2017, around the time that AlphaGo came out. People were really excited about the capabilities of GANs and RL. At the time, I remember companies like Airbnb, Uber, Lyft building these big research teams, but not really having a clear agenda for those research teams, or understanding how the objectives of their research teams might align with the objectives of the broader organization. And then similarly, you saw all of these startup founders emerge that were talking about changing healthcare with GANs or changing finance with RL, but didn't really have insights into the nuances of those industries. My feeling of why ML didn't work the last time around — or rather, why ML adoption didn't occur at the pace that we anticipated — was that it was not really a technical problem, but rather a product, go-to-market problem. I am hoping that this time around, we've both learned from our mistakes but also — in the intervening time period — created enough enabling technologies, such that two things can occur. One is that companies can fail fast. Frankly, one of the things that scares me is that back then I remember a bunch of companies reaching out and basically saying things like, ""Hey, we've got a bunch of data. We'd love for you to come in and talk to us about our AI strategy,"" and thinking, ""I don't care if you have a bunch of data. Let's talk about a bunch of problems that you have, and how ML can solve those problems."" I've come to believe that you can't fight that urge. Founders will always be enticed by the promise of AI. But if they're able to experiment with it quickly, then I think they can start to learn more about the infrastructure, and data, and other investments that they may need to make in order for their AI initiatives to be successful. At the same time, I think by creating these higher-level interfaces that make ML more accessible to potentially the domain expert, it allows people with a more thorough understanding of business problems to at least prototype AI solutions. I'm somewhat skeptical that these very high-level interfaces will allow them to build production ML at scale, but at least they can see, ""Does it work? Do I need to now hire a data/ML team to realize this initiative further?"" Lukas: Do you have companies in mind that you like, that are creating these higher-level interfaces off of ML technology, that makes them usable for real world applications? Sarah: Yeah. I think Runway is actually a perfect example of the phenomena that I see playing out. Some people may not know, but Runway actually started off more as a model marketplace. Their goal had been to make GANs and other types of models accessible to creative professionals, but they weren't really focused on building out the video editing tools, at least initially. They created these higher-level interfaces, such that various creative professionals — whether it was artists, or directors, or photographers — could start to experiment with ML models. What they saw was that some of the most popular models were models that automated routine tasks associated with video editing. Based on that user behavior, they decided to double down on video editing. In fact, a lot of the model architectures that they've since created — including Stable Diffusion — were really purpose-built to support the workflows of video editors. I like that sort of workflow, where you use a prototype, or you use these higher-level interfaces to get insight into what users need — as well as potentially the limitations of the underlying technology — and then you iterate from there. Lukas: I totally remember a time, I think, of the era you're talking about — 2014 to 2017 — when every company was like, ""Oh, we have this data. it must be valuable because we can build a model on top of it."" Do you see some analogy today to that? What's the common request of an ML team that's misguided, or should be thinking more about problems? Because I feel like data maybe isn't seeming quite as valuable, in the world of LLMs and big models. Sarah: I think that what we're seeing today is arguably more nefarious than what we saw back then, because at least at that point in time, companies had invested in collecting data. They had thought about possibly what data to collect. And so there was some understanding of how to work with data. I think people see the output of models like DALL·E, GPT-3, et cetera, and they're amazed by what AI can do. And so the conversation doesn't even hinge on, ""We have access to this data set,"" or ""We have access to this talent pool,"" or ""We have this type of workflow that could benefit from these generative capabilities."" It's more, ""AI is magical. What can we do with it? Come in and talk to us about this."" And again, I think that that is somewhat dangerous. I was at a conference just last week. There was a presentation on ML infrastructure at a music company, and somebody in the audience asked, ""Does the AI listen to songs?"" It's a perfectly reasonable question. But I think it does kind of belie some of the misunderstanding of AI and how it works. Lukas: In what sense? Sarah: I think people think about AI as artificial agents. They think of AI as something that could listen to a song, not just something that could represent a song and make predictions based upon the content of that song. Again, I think better understanding of what LLMs are and what they can do will be really necessary to identify when they can be useful. Lukas: This might sound...this is a little bit of a soft ball — or might sound like a soft ball — but I was really genuinely interested in this. I feel like one of the things that you do really well, at least in my conversations with you, is maintain a pretty deep technical and current knowledge of what's going on in data stacks, basically. Or, data infrastructure and ML infrastructure. But yet you're not maintaining data infrastructure — as far as I know — so I'm kind of curious how you stay on top of a field that seems like it requires such hands-on engagement to understand it well. Or at least I feel like it does for me. Yeah, just curious what your process is. Sarah: Yeah. It's interesting because I'd say that, in some ways, that is one of my biggest concerns. I've been in venture now for about seven years, and so I can still say that I've spent most of my career in data. But it won't be long before that is no longer true. And certainly I have found that my practical, technical skills have gotten rustier. One comment on that is that I do think that losing my Python, SQL skills, etc. has actually enabled me to look at some of the tools and platforms that are available to users today, with a fresh set of eyes. I'm not as entrenched in the same patterns of behavior and workflows as I was when I was a practitioner. So it's been helpful to shed some of my biases. But I think what I've discovered is that you can understand how something works without using it. And therefore there are two things that are kind of critical to building technical understanding for me. One is just spending a lot of time with practitioners, and hearing about their experiences. How they're using various tools, how they're thinking about various sets of technologies. Frankly, just learning from them almost feels like a shortcut. Instead of trying to figure out what the difference is between automated prompting and prefix-tuning, just going to ask somebody and have a conversation with them. Which is kind of coincidental, and perhaps even ironic. Like, accelerate my learning by just learning from people with expertise in those areas. There's a lot that I just learned through conversation with practitioners. But I think going one level deeper — either reading white papers or reading research papers that give you kind of a high-level overview of an architecture, or how something works without getting into the nitty gritty of the underlying code or math — allows me to reason about these components at a practical level of abstraction. I can see how things fit together. I understand how they work. That doesn't necessarily mean that I'd be able to implement them. Definitely doesn't mean that I'd be able to iterate on them. But it's enough depth to reason about a component, and it's placed in a broader technical stack. Lukas: It's funny though, sometimes I feel like investors...I mean all investors do that to some extent, and I totally get why. But I think that I often feel also paranoid about losing my technical skills, because I feel like if all you can do is sort of figure out what box something belongs to, it's really hard for you to evaluate the things that don't fit into boxes. And I feel like almost all the interesting advances — actually, all the products that we want to come out with at Weights & Biases — generally is stuff where it doesn't fit neatly into one of those ML workflow diagrams that people make. Because if it was one of those boxes, then of course people are doing it, because it makes logical sense, but it's sort of when that stuff gets reshuffled...it does seem like you're able to maintain a much greater level of technical depth than the average investor, even in the data space. Which is why I wanted to have you on this podcast. I hope I'm not offending any of my current investors. Just a caveat there. You all are wonderful. I really do feel like you somehow maintained a much greater technical depth than most of your colleagues. Sarah: In many ways I'm amazed by my colleagues and what they do, because I think there are many investors that can reason about the growth of companies, and reason about sets of boxes and the relationships between those boxes without understanding what those boxes do. I don't think I could do that, but I've always also just been the type of person who needs to go a little bit deeper. As an example, I started my career in data science, but at Amplify I also invest in databases. And at some point — writing SQL queries, working with dataframes — I just wanted to better understand what was happening. When I write a SQL query and data shows up in my SQL workbench, what is happening on my computer? I think a lot of people take that stuff for granted. And they can. That is the beauty of abstractions. That is the beauty of technology. We are able to have this video conference — we are able to connect over the Internet — without understanding how the Internet works. My personality is such that I want to understand how the Internet works. I want to understand why I have service in some places and why I don't have service, and why my dataframe is slower than my SQL query. I do think that that makes me think about technical systems in different ways. Lukas: It’s funny, my co-founder Shawn is obsessed with — in technical interviews — assessing if someone understanding how a computer works, in his words. Which I think is really interesting, because I feel like I'm actually not... That's kind of a weakness of mine, I always wonder about a lot of the details there, but it is sort of an interesting perspective. I love working with all of my colleagues who have that same drive to understand how everything works. Okay, here's another question that I was wondering, I was thinking about. If I were to come to you, and I had a company in the data/ML space, and I had a bunch of customers that were really who we think of as tech-forward — like Airbnb, and Google, and that genre — would that be more impressive? Or would you be more thinking I'm likely to succeed if I came to you with a set of customers who we don't normally think of as tech-forward? Like an insurance company — a large insurance company — and a large pharma company. Which would you look at and say, ""Oh, that seems like that company is going to succeed""? Because part of me watches technology flow from the more tech-forward companies everywhere. But another part of me is like, ""Wow, these kind of less tech-forward companies have a whole set of different needs and often a different tech stack. And certainly there's more of them and they have more budget for this stuff."" So which would be the more impressive pitch for you? Sarah: Yeah, it's funny because I think in many ways the way that VCs make decisions — the way that we think about deals — is actually super similar to some of the patterns that we observe with neural networks. And that of course means that we have bias. It also means that we learn from patterns that we've observed. So, I can give you the honest answer, and then I can also give you the rational answer. The honest answer is that I would be more impressed by a company that has engaged with tech-forward customers. For the reasons that you described. In the past, we have generally seen that tech will spread from the Airbnbs and Ubers and FAANGs of the world into the enterprise, and not the other way around. We also have a bias that these more traditional enterprises tend to move slower. There tends to be a lot of bureaucratic red tape that you need to navigate. And as such, those markets tend to be less attractive. So, on its face, if you just said...you don't have any additional information about the velocity of sales, about the quality of the tech or team, etc. But like you're- Lukas: -holding them equal, I guess. Equivalent. Sarah: Yeah. That said, I think that is one of the biases that can cause us to make poor decisions. What really matters are some of the things that I just alluded to. If you're able to sell into insurance companies repeatedly — and with high velocity — that is arguably a better business than a company that spends 6 to 12 months trying to sell into tech companies. So it's less about ""To whom do you sell?"" and more about, ""Is that a big market? Are you able to sell efficiently? Are you able to sell scalably?"" I think sometimes we need to be aware of our biases and the impact that marquee logos can have on our decision-making. Lukas: Well, I can't tell if you think it's a rational bias or not. I mean, in some sense, you could call all pattern-matching biases. Do you really think it would be rational to sort of be less enamored with tech-forward customers than you actually are? Sarah: I think we need to ask ourselves and probe on, ""Under what circumstances might enterprises move quickly?"" A great example of this is a company called Afresh, which was one of the companies that did use RL to disrupt an industry. At that time that so many companies were trying to do the same thing, but didn't have as much insight into what was happening within an industry. They offer tech solutions — including things like inventory management and forecasting — to companies in the grocery space. Now, you might think that grocery is going to be a super outdated, slow-moving industry. And therefore that selling into grocery chains would be long and tedious. And perhaps not very scalable. But, at the time, a lot of grocery stores were responding to — and/or otherwise just terrified by — the acquisition of Whole Foods by Amazon. This was then [followed] by the pandemic, which certainly put a lot of stress on their online and multi channel-delivery and e-commerce capabilities. So there were these exogenous shocks which made what might have been slow-moving market participants move a lot faster. Those are the phenomena that we're sometimes blind to, because we just hear ""grocery"" or ""healthcare"" or ""manufacturing"" and think ""slow"", rather than thinking, ""What would it take for the participants in that sector to move fast?"" Lukas: That makes sense. Here's another point that you made on Twitter, that I was contemplating. I actually don't think I have a strong point of view on this, although I really should — given the company that I'm running — but you mentioned a lot of VCs have been saying that you expect the point solution MLOps space to consolidate. One thing that's interesting about that, is that I think you've invested in some MLOps tools. Do you sort of expect them to expand in scope and eat the other companies? Is that something that you need to bet on when you invest in them? Or would you be happy to see them get bought by other tools? How do you think about investment then, in MLOps tools companies, with that worldview? That's my practical question. And then the other thing that I observe, is that it doesn't necessarily seem like developer tools in general is consolidating. So I think I might even agree with you, but I wonder how you sort of pattern match that against developer tools. Or even maybe the data stack... I don't know. Do you think that the data stack is also consolidating? Or what's going on there? Sorry, I just dumped a whole bunch of different questions on you, but... Sarah: Those are great questions. So, I do think that in general most technical tools and platforms will go through phases of consolidation and decoupling. Or, as people love to say today, bundling and unbundling. I think it's just the nature of point solutions versus end-to-end platforms. You have a bunch of point solutions, they're difficult to maintain, they may be challenging to integrate. You then kind of bias towards end-to-end platforms, you adopt an end-to-end platform. It doesn't address a certain edge case or use case that you're experiencing, so you buy a new tool for that edge case, and unbundling happens. I think the pendulum will always swing back and forth between bundling and unbundling, for that reason. Or coupling and decoupling, for that reason. To be clear, as a former buyer, I don't think that point solutions or end-to-end platforms are the best solutions for a company. I think there's space in the middle, where you have a product that can solve a few adjacent problems. That's typically what I look for when I invest. I want to make sure that the company in which I'm investing is solving an urgent — and often point — problem. They're solving an urgent and specific problem. However, I typically also want to see that the founder has a hypothesis about how they would expand into adjacent problem areas. It's not that I think solving point problems is bad, but I do think given the pendulum of coupling and decoupling, having some hypotheses about the areas that you can expand into becomes critical. It's interesting to consider why this may or may not happen in the world of developer tools. I'd argue that you still see consolidation. However, the consolidation tends to happen across layers of the stack, versus across the workflow. Lukas: Interesting. What are you...tell me...what are you thinking of there? Sarah: Things like serverless, where you're no longer reasoning about resources and config. That might not be impacting other parts of your developer workflow. That might not be eating into your git-based development workflows, or your testing processes, and things like that. But it is eating into how you think about managing VMs or containers. It is possibly eating into how you think about working with cloud vendors, and deciding upon underlying hardware, and things like that. So it might be the case, that it's like in software development, we've seen companies — or we've seen vendors — solve specific problems, but solve those all the way down the stack. I haven't really thought about that as deeply. But I think it's a worthwhile question to ask. I would say that one of the big differences, though, that I see — and that we of course need to be mindful of — is that there are far more developers than there are data practitioners. And so, when you're trying to answer the question, ""How does this thing get big?"", those building developer tools can arguably solve a specific problem for a larger number of people versus data teams when you're trying to answer this question of, ""How does this get big?"", you could potentially get stumped just by the number of people for whom a tool is actually applicable. Lukas: Is that what gives the intuition that we're in a moment of bundling? That there's just all these point solutions that you feel kind of can't survive on their own, just given the size of the market that they're in? Sarah: I think it's a combination of things. On one hand, I see a lot of...the slivers are getting tinier. You start to see things like ""model deployment solutions for computer vision,"" and perhaps some subset of computer vision architectures. Where, you might think to yourself, ""Okay, I understand why the existing tools are maybe not optimal for that specific use case, but that's really narrow."" To my point about thinking about these orthogonal problems, it's unclear how you go from that to something meatier. That's one phenomena that I observed. I think the other is just that practitioners are really, really struggling to stitch things together. The way a friend put it to me about a year ago, he basically said he feels like vendors are handing him a steering wheel, and an engine, and a dashboard, and a chassis, and saying ""Build a fast, safe car."" Those components might not even fit together, and there's no instruction manual. It's easy to cast shade on the startups that are building these tools and platforms, but I think one of the things that is more challenging in the ML and AI space than even like data and analytics, is that a lot of the ML engineering and ML development workflows are really heterogeneous now. If you're a vendor and you're trying to think about, ""With whom should I partner? With whom should I integrate? Do I spend time on supporting this integration?"", it's tougher to make those decisions when practices and workflows are so fragmented and heterogeneous. I do think that creating more of a cohesive ecosystem has been difficult not because vendors are dumb, but because there's just a lot going on. Lukas: Well, I think the other challenge maybe is that when there's so many different technologies that people want to integrate into what they're doing — because there's so much exciting research and things that come along, based on different frameworks and so on — it's hard to imagine an end-to-end system that would actually be able to absorb every possible model architecture immediately, as fast as companies want to actually use it. Sarah: Yeah, yeah 100%. I have been thinking about this in the context of LLMs. We don't yet know how the consumers or users of pre-trained models are going to interact with those who create the pre-trained models. Will they be doing their own fine-tuning? Will they be doing their own prompt engineering? Will they just be interacting with the LLM via API? Without insight into those interaction models, it's really hard to think about building the right set of tools. It's also unclear to me that the adoption of LLMs would actually imply that we need a new set of tools, both for model development and deployment, and management in production. I have a lot of empathy for people who are building ML tools and platforms because it's a constantly moving target. Yet, there's the expectation that you're able to support heterogeneity in all regards. In all regards, whether it's the model architecture, or the data type, or the hardware backend, or the team structure, or the user skill sets. There's so much that is different from org to org. I think building great tools is really challenging right now. Lukas: I guess that's a good segue to a question I was going to ask you. When you look at LLMs, do you have an intuition on if a new set of tools are needed to make these functional? Sarah: I think one of the bigger questions that I have is, again, on how the consumers of LLMs — or how the users of LLMs — will actually interact with those LLMs. And more specifically, who will own fine-tuning. I imagine that there are certain challenges that will need to be addressed, both with regards to how we collaborate on the development of the LLMs, but also how we think about the impact of iterations on LLMs. If OpenAI wants to retrain one of their models — or otherwise tweak the architecture — how do they evaluate the impact of that change on all of the people who are interfacing with the GPT-3 API, or with any of their other products? I think a lot of the tools that were built for model development and deployment today kind of assumed that the people who were developing models would be the same set of people — or at least within the same corporate umbrella — as those who are deploying and managing models in production. And if LLMs drive a shift — wherein those who are developing models and those who are deploying and building applications around models are two completely separate parties — then some of the tools that we have today might be ill-suited for that context. Lukas: Do you think we're headed towards a world like that, where there's a small number of companies generating foundational models? And then mostly what other companies are doing is fine-tuning them or doing some kind of prompt engineering to get good results out of them? Sarah: Here we're getting a little bit into the technical nitty gritty, but my impression from tracking the research community so far has been not all...though LLMs are great for what we typically think of as unstructured data — primarily images, text, video, et cetera, audio too — they have not outperformed gradient boosting or more traditional methods on structured data sets, including tabular and time series data. Although there's some work on time series that I think is pretty compelling. This is one of those areas where I feel like the research community just completely underestimates how many businesses operate on structured data. While it's possible that adoption of LLMs will drive this new interaction model or new market model — wherein some companies built these large foundation models and others interact with those — I don't see gradient boosting or more classical approaches going anywhere. Because I don't see structured data going anywhere. Arguably, structured data powers many of the most critical use cases within organizations, ranging from search and recommendation engines to fraud detection. I think it would be a tragedy to neglect the needs of those who are using...I don't want to say simpler approaches, but certainly simpler approaches and more complex approaches, by using architectures that are not perhaps attention-based, when working with these specific data sets. Lukas: Interesting. Do you have an opinion on...how to say this? I feel like many investors especially, but I think many smart people looking at the space of ML and data, they think, ""Wow, this is gonna commoditize. This is going to get...tools are gonna make this easier. Less companies are going to want to do this internally and spend money on expensive resources."" But I guess when I look at what companies actually do, it seems like they spend more and more, and even kind of push up the salaries. And they have this fight for scarce, specific talent. Which way do you sort of predict things are going? Do you think like 10 years down the road, ML salaries go up or do they go down? Maybe it's a more concrete way of putting it. Sarah: Yeah, that's a great question. I probably expect that the variance would increase. My guess is that there are certain applications that may be commoditized — or at least that may be commoditized for some subset of the market — while others continue to be pursued in-house. Search is perhaps a very interesting example. For some businesses, they may be more than happy to rely upon a vendor to provide those semantic or vector-based search capabilities. While search may have an impact on their bottom line, perhaps it's not the most critical or most impactful thing to their business, but rather just a capability that they have. This is not to say that Slack actually uses a vendor or should use a vendor, but as far as I can tell, Slack doesn't really monetize on search. You'd contrast that, however, with an e-commerce business or something like Google, where their ability to deliver the highest quality search results and their ability to improve search — just marginally — could be a huge impact on revenue. Those companies are probably likely to develop their own models. I think we'll see that some companies do their own model development. Some use cases are not commoditized, and those companies for those use cases you see very high ML salaries. But then, perhaps for others, you're really just a software engineer who knows a little bit about ML, and can interface with some of these models through APIs, and can reason about the output of experiments and behavior that you might see in production. Lukas: I guess in that vein — and you sort of alluded to this earlier a little bit — what do you think about all these sort of low-code and no-code interfaces into exploring data, building ML models? You mentioned earlier that you think that's generally a really exciting trend. Sarah: My opinions on this category are pretty nuanced, so I was thinking about where to start. Generally speaking, I'm very skeptical of no-code, low-code solutions. I find that many of these tools — no matter what the sector or what the use case — they end up shifting the burden of work. Not necessarily removing that burden, or even lightening that burden. A great example is self-service analytics. My own belief is that in general, most self-service analytics tools don't actually reduce the burden that the data team or analytics team bears, but rather shifts the work of the data team from building analytics products to debugging, explaining, or fixing analytics products. And I think the same can be true in the ML space. Why I'm excited about some of these tools in the ML space is that I actually think that in ML, failing fast is really critical. Some of these tools that enable users to prototype ML-driven solutions might help them better understand, ""Is this going to work? What additional investments do I need? What do my users expect from the system before they make a decision to invest further?"" It enables that kind of quick prototyping, learning, and failing fast. The other thing that I feel quite strongly about, is that we need to explore ways to decouple model development and ML-driven app development. Whenever I talk to companies about their ML architectures or their ML stack, it becomes so obvious that ML is just this one tiny component in a much larger app architecture. The prediction service might be connecting with other databases, or stream processing systems, or other microservices, tools for authorization, and so on and so forth. I think it's really important to be able to build applications around a prediction service while independently iterating on the model that powers that prediction service. So, I am somewhat long on tools that enable engineers to prototype ML-driven systems, so that they can build those application architectures. And then, once they have a better understanding of the full system requirements — including some of the latency associated with things like moving data around — they can kind of pass off a fuller spec to a data scientist who will iterate on the model and model architecture, armed with the knowledge that these are the attributes that we need in order to make this project successful. Lukas: That makes sense. Okay, another question. When you invest in a company that is providing some kind of ML or data service, does it cross your mind, ""What if AWS does that?"" Or GCP or Azure. Is that an important thing to consider, do you think, or is that irrelevant? Sarah: Yeah, yeah. I smile because I feel like this question, it comes up somewhere between like one to five times a week. Given the areas that Amplify invests in — we're primarily focused on data, ML tools and platforms, enterprise infrastructure, and developer tools — we're constantly fielding this question of, ""What if AWS or GCP or Azure does this? Won't that company — won't that market, et cetera — get crushed?"" In the past, what I've told people is that I have found that startups tend to be better at building developer experiences. Anecdotally, this is just something that we observe. People complain a lot about the experience of using AWS tools, the experience of using things like SageMaker. I've thought a little bit more about why that's the case. I think, generally speaking, the cloud vendors need to develop for their most spendy customers, their highest-paying customers. And their highest-paying customers tend to be enterprises, shockingly. As such, they're developing for an enterprise user who probably has fairly strict privacy/security requirements, who may have a very distinct way of organizing their teams, who may be bringing in a persona with a specific skill set into data science or ML roles. If I had to present a hypothesis about why they haven't been able to compete on developer experiences, I think it's because often they are creating tools and platforms for a developer who is not as representative of the rest of the market. But, to be honest, with the passage of time, I've just seen enough examples of companies that have been able to out-compete the cloud vendors where I just don't worry about it that much anymore. Lukas: Have you ever seen anyone get crushed? Sarah: Crushed? Lukas: Has that happened in your career? Sarah: No. I mean, I'm sure it has. But it's hard for me to think of an example, whereas it's easy to think of many, many examples of companies that were not crushed by the cloud vendors. If anything, I think sometimes we see that start-ups get...they sell too soon. The way in which the cloud vendors out-compete them is putting some juicy acquisition offer in front of them and then they don't have to compete. That's the only example that I could see or think of, off the top of my head, of the cloud vendors crushing a potential competitor. They crush it with their dollars. Suffocate companies with their acquisition offers. Lukas: R&D through M&A, yeah. I saw an interview or a conversation that you had with Andrew Ng. I thought you had an interesting point that academic benchmarks...they often don't really reflect industry use cases. But you were kind of pointing out that industry has some share of the blame for this. Can you say more on that topic? Sarah: Oh, absolutely. I am really grateful to Andrew for actually drawing my attention to this issue. We often think about the gap between research and industry, but we don't as often think about the gap between industry and research. Andrew and I had been talking about this challenge of structured data versus unstructured data. I think I said to him, ""What I see in industry is that most ML teams are working with tabular and time series data. What I see in the research community is that most researchers are building new model architectures for unstructured data."" There's a big mismatch between what model architectures people in industry need — given the data that is available to them, as well as given the types of problems that they're trying to solve — and the research that's becoming available. Now he pointed out to me — and this is something that I hadn't really thought about before — researchers have access to unstructured data. They have access to things like ImageNet. They don't have access to high volumes of data on user sessions, or logs, metrics, and events. The data sets that tend to be the lifeblood of most companies. It is very difficult to innovate on AI techniques for data sets to which you have zero access. I think it's easy to point to that research and be like, ""Oh, there's such a big gap between what they're building and what we need."" I think we also need to be mindful of what the research community can do, given the resources that they have available to them. I've seen a couple of efforts by a few organizations to open source their data sets, but it's tough because oftentimes the most valuable data sets are the most sensitive ones. What company wants to share their click-through data that probably reveals the state of their business, some of the experiments that they're running, and so on so forth. Lukas: Well, there's also not a lot of upside. I remember the Netflix contest was such a popular, awesome thing. Got so many people involved, so much attention to research to Netflix — still a seminal data set — but they didn't do a second one because they felt like...there are user privacy issues, that they couldn't get around to release it. I don't know if you remember when AOL released a subset of their query logs. It was so exciting to actually have that. I was in research at the time and I was like, ""This data set is like gold."" And then like the next day, they fired the person that released it. And their boss — I think their boss' boss, right? — because there was some personal identifying information in that. It's hard to see a lot of upside for corporations, even if they were sort of neutral on the impact of...on the company secrets, IP issue. Sarah: Yeah. One of the things that I have seen — that has been very encouraging — is more and more interview studies or meta analyses coming out of the research community. Where it's clear that the researchers are interested in better understanding the problems that practitioners face in industry. One critique that I've had of those studies in the past, is that the authors tend to interview people to whom they have immediate access, which means that they often interview practitioners at some of their funding organizations. The organizations that are sponsoring their labs, which means that they tend to bias more towards larger enterprises or big FAANG companies. They're interviewing people at Facebook, Apple, Tesla on their data and ML tools, platforms, practices, and then drawing conclusions about all of industry. But I think that recently I've seen a couple of studies come out where there's been a more focused effort to get a more random — or at least more diverse — sample of practitioners from both smaller startups, more traditional companies, bigger tech companies, et cetera, to really better understand both the similarities and differences between how they approach model development and deployment. I hope that continues. Lukas: Do you have a study that's top of mind, that you could point us to? Sarah: So, Shreya Shankar, who had actually been a university associate. Lukas: Yeah, I saw that. Totally. Nice. Sarah: I was really thrilled because Shreya actually reached out to us and said, ""Hey, can you connect us to people at different types of companies? I've got connections to people at Instagram, Facebook, Apple, et cetera et cetera, but I want to talk to people at mid-market companies, or early-stage startups, and B2B companies, and better understand some of the nuances of their workflows."" Lukas: What was the name of the paper? I think I just saw it. Sarah: ""Operationalizing Machine Learning: An Interview Study"". Lukas: Thank you. Yeah, I agree. That was an excellent paper. Sarah: Yeah, yeah. The other thing that I had said...I sent Shreya a text message after reading through it. The other thing that I really appreciated about the interview study was that she didn't cherry pick the insights that were most likely to drive interesting research questions or solutions. I think she took a really genuine and unbiased approach to thinking about, ""What are the problems that people are talking about? What are the ways in which they're there solving them? Let's highlight that there are a bunch of problems that people are just solving in practical — albeit hacky — ways, but ways that they're content with."" I thought it was a very honest study. Lukas: Totally. I totally agree. Well, I guess if we are possibly headed towards another bubble in machine learning — or machine intelligence, as you sometimes call it — do you have any advice for a startup founder like me? Or maybe an ML practitioner, which is most of our audience. Having gone through another bubble, how would you think about it? What would you do if you started to...I think we're already seeing bubble-esque behavior. What are the lessons? Sarah: I think the most critical lesson that I saw/learned the last time around was, ""Focus on your users,"" or ""Focus on the strategic problems that you're trying to solve."" And ""Really, really understand if and why ML is the best tool to solve that problem."" I think it's critical to think about machine learning as a very important tool in our toolkit. But one of several tools. I was catching up with a friend a couple of weeks ago, and she had mentioned to me that the way in which she prioritizes ML projects is through regular conversations with their product leadership, and engineering leadership — and her representing ML leadership — about the product roadmap, about the user behaviors that they're trying to unlock. And then thinking about whether ML or traditional software development approaches are a better tool for achieving those things. I think as long as we continue to think about ML as a tool to solve problems — and as long as we have the tools that enable us to better understand if ML is solving those problems, and how to improve upon its ability to solve those problems — then ML can be a super powerful tool. And one that we learn to wield in more powerful ways too. But — I feel almost like a broken record saying this, given the lessons learned in the past — if we treat ML like a silver bullet, if we treat it like a hammer looking for a nail...that was the pattern that I think led to failure. Don't think about ""What ML can do for you"", think about ""What you can do for your country,"" and if ML is the right way to do that, I guess. That's the lesson that we learned and I hope it's the lesson that we will carry forth. Lukas: Love it. We always end with two open-ended questions. The first of the two is, if you had extra time, what's something that you'd like to spend more time researching? Or, put another way, what's an underrated topic in data or machine learning? Sarah: Oh man, that one is very easy for me: programming languages. I would love to spend more time learning about programming languages. I am definitely not convinced that Python is the right interface for data science, or that SQL is the right interface for analytics work. I would really love to learn more about programming language design, so that I could better diagnose if and why Python and SQL are the wrong tools, and how one might go about building a better PL interface for data scientists, ML engineers, and analysts. Lukas: Okay, a question that I didn't ask — because I thought it was a little weird or maybe nosy — is why you're asking on Twitter if anyone knew any female Rust developers. Because I will say Rust comes up just a shocking amount on this podcast, and I was wondering what's driving the interest in Rust, and then if there was some reason behind looking for a female Rust developer, and if you actually found one. Sarah: Yeah, yeah. So, full transparency — and I think I maybe put some of this on on Twitter too — quick background is that certainly earlier in my career, I felt like oftentimes I wasn't getting invited to the same set of events, et cetera, as some of my male peers, and therefore I wasn't getting exposure to the same set of conversations — maybe even the same opportunities — to potentially see deals, and things like that. I feel pretty strongly that we need to have women in the room when we host events, to ensure that they're getting exposed to the same set of opportunities. That we're not doing things to hamper their progress in the industries in which they they operate. We were hosting a Rust developer dinner, and looked at the guest list, and there weren't that many women, and it felt like we could do better. Thus the origins of my question. Lukas: I see. Sarah: Why Rust? See, I wish I spent more time studying programming languages, so I could better understand why people are shifting from C++ to Rust. Luca Palmieri — who I believe is now at AWS, actually — has a great blog post on why Rust might be a more appropriate backend for Python libraries that often have C++ backends. Things like pandas, where we experience it as Python but in fact it has a C++ backend. I've heard that Rust is more accessible than C++ and therefore could perhaps invite more data practitioners to actually contribute to some of those those projects. But I don't know enough to really say why Rust is so magical, other than a lot of smart people — apparently, like Linus Torvald too — believe it is. If it's good enough for him, it's good enough for us. I don't know. Lukas: Fair enough. My final question for you is, when you look at the ML workflow today going from research into deployment into production, where do you see the biggest bottlenecks? Or maybe where do you see the most surprising bottlenecks for your portfolio companies? Sarah: I generally think that...there are two bottlenecks that I would call attention to. Actually three, sorry, I'm being kind of indecisive here. One pattern that I've observed with ML is that we often iterate on ML-driven applications — or ML-driven features — more frequently than we iterate on more traditional software features. To give an example, we may iterate on a pricing algorithm far more frequently than we would iterate on a navigation panel, or an onboarding flow, or something like that. Earlier I was talking about understanding how ML can solve user and company problems. I don't really think we have enough insight into the way in which model performance correlates with behavioral data — or the product engagement — to iterate super effectively on models. I think that has been a limitation, and one that could have nefarious effects in the future. Another big challenge that I see — and I alluded to this before — is the challenge of building software applications around a prediction service, or around a model. In the past, people might have talked about this as a model deployment problem. The problem isn't containerizing your model and implementing a prediction service in production. I think that has gotten significantly easier. The problem is connecting to five different databases, each which have different sets of ACID guarantees, latency profiles...also connecting to a UI service, potentially connecting to other application services. The problem is the software development. What you've got is a trained model, but now you actually have to build a software application. I don't think we have great tools to facilitate that process, either for ML engineers or for software engineers. And then around the same space, I also think that the transition from research to production — and back — can still be challenging. Perhaps what a company wants to do — upon seeing an issue associated with the model in production — is actually see the experiment runs associated with that model, so that they might get more insight into what is now happening in that production environment. That shouldn't be difficult to do. But, in the past I think we really developed tools either for model development or for MLOps, and we're starting to see some of the pain points that arise when those sets of tools are not coupled together. Lukas: Cool. Yeah, that all definitely resonates with me. Sarah: Lest I sound too cynical, I am really optimistic about the future of ML. I think we just need to do it in a sane and rational way and be mindful of what we're trying to accomplish here, instead of just focusing on flashy press releases and cool demos. Lukas: I was thinking as you were talking about the hype cycle, and large language models, and stuff. I was thinking VCs probably feel the hype cycle the fastest. I'm like, ""Man, we've basically solved the Turing test and, like, no one cares. My parents are like, ""What even is this,"" you know. It's like, ""Come on, this is awesome, look at it."" But I think every investor knows about Stable Diffusion but I don't think...I even come across Chief Data Officers at Fortune 500 companies who are like, ""What's Stable Diffusion?"" It's like, ""Come on, you should know about this."" Anyway... Sarah: Yeah, yeah. But I think there's this awareness, though, of ""This is where the hard work starts."" Lukas: Yeah, totally. Sarah: ""Great, we're able to generate beautiful artistic renderings based on textual prompts. Okay, how do we generate photos that are equivalent to that which a professional photographer would produce?"" Because that's what it's going to take to get a Getty Images or Flickr to adopt something like Stable Diffusion. How do we make automated rotoscoping so good that a video editor doesn't need to correct the mask at all? Because that's what it's going to take for Runway to compete with some of the more traditional video editors. I saw, through Runway, that the research is not good enough. They've had to do a lot of engineering, as well as their own research, in order to operationalize some of these things. I am so optimistic about the potential of the technologies, but I also am realistic that reining them in, and actually leveraging these technologies to do good in the world — or to build great products — is hard. Short anecdote, but I've been talking to a founder who was working on brain-computer interfaces and actually developed this technology where, effectively, it's able to read minds. You had to put on some big helmet thing, but once the helmet was on, it could kind of transcribe thoughts. And they were able to get it to work. Now, the founder subsequently shifted focus to the gaming space, doing more work with haptic interfaces. I was asking him like, ""Why didn't you pursue the mind reading tech further?"" And he said to me, ""We couldn't find any great use cases."" Isn't that crazy? But I think, this is tech. Sometimes you can do absolutely remarkable things with technology. But it doesn't matter. It doesn't matter unless you figure out how to appeal to people, and get them to use it, and how to align that technology with an important set of problems. I think that is the thing — as VCs — we need to continue to remind ourselves. Tech is not easy. Tech is not easy, but people are not easy either. Both are really hard. Unlocking new sets of technologies often means that we are granted the opportunity to solve really hard human problems. I guess...TL;DR if GPT-3 starts reading minds. Maybe we'll be able to find some applications for it. But, we'll see. Lukas: Thanks so much, Sarah. That was super fun. Sarah: Yeah, for sure. Bye! Lukas: If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So, check it out.",9519 +Cristóbal Valenzuela — The Next Generation of Content Creation and AI,https://www.youtube.com/watch?v=wbonGgk-_Gk,2426,2023-01-19,"Cris: I think a big mistake of research — specifically in the area of computer creativity — is this idea that you're going to automate it entirely. You see one-click off solutions to do X, Y, or Z. I think that's missing the bigger picture of how most creative workflows actually work. That probably means that you've never actually worked with an agency where the client was asking you to change things every single hour, or make it bigger, make it smaller, right? Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Cris Valenzuela is an artist, and technologist, and entrepreneur, and CEO and founder of a company called Runway, which is a maker of ML-powered video editing software. But I feel that description doesn't even do justice to how incredible and innovative his product is. This interview actually starts off with a live demo of his product. I really recommend switching to video if you're listening to this on audio only, because his demo is absolutely incredible. Well, all right, Cris, we don't normally do this, but I thought it would be fun to start with a product demo if you're down for it. You have such a cool, compelling product. Would you be up for that? Cris: Sure. What do you want me to demo? There's a lot I can do. I want to make sure I can focus on what you want to see. Lukas: Well, this is an ML podcast. So I think people would probably be interested in the most flashy ML features. How about that? Cris: In short, Runway is a full video creation suite. It allows you to do things that you might be able to do in more traditional video editing software. The main difference is that everything that runs behind the scenes...so, most of the core components of Runway are ML-driven. The reason for that, it has two main kind of modes or uniqueness about making everything ML-based. One is, it helps editors, and content creators, and video makers automate and simplify really time-consuming and expensive processes when making video or content. There are a lot of stuff that you're doing in traditional software that are very repetitive in nature, that are very time-consuming or expensive. Runway aims basically to simplify and reduce the time of doing this stuff. If you have a video you want to edit, an idea you want to execute, spending the time, and the minutes, and the hours, and sometimes days on this very boring stuff is not the thing that you really want to do. So we build algorithms and systems that help you just do that in a very easy way. And then there's another aspect of Runway, that it's not only about automation, but it's about generation. We build models, and algorithms, and systems that allow our users and customers to create content on demand. > And everything...baseline for us is that everything happens on the browser. It's web-based and cloud native, which means that you don't rely any more on native computers, or native applications, or desktop compute. You have access to our GPU cluster on-demand, and you can render videos on 4k, 6k pretty much in real time. Plus you can do all of this AI stuff also in real time as well. A lot of the folks are using Runway now — CBS, The Late Night Show with Colbert, or the folks who edit Top Gear, or sometimes creators who do stuff for Alicia Keys or for just TikTok or movies — they're all leveraging these AI-things via this web-based cloud based editor. So that's a short, five-minute intro, what the product does and how ML or AI plays a role in the product itself. But I'm happy to now show you how everything goes together and the experience of using the editor, if that makes sense. Lukas: Please, yeah. Cris: Cool. Any questions before we do that? I can double down, or if you want to me to clarify? Lukas: Well, I actually didn't realize that professional video teams like The Colbert Show use Runway. Do they use it for all of their video processing or is there a certain part where they they use it? How does that work? Cris: It depends. Some editors and some folks are using it as an end-to-end tool to create videos. Some other folks use a combination of different softwares to make something. The folks who we use it for movies sometimes add in Nuke or Flame. We have a big Flame community, so Runway becomes a part of that workflow. It's replacing either something you do on a very manual basis. It's sometimes replacing a contractor you hired to make that work for you, or it's sometimes replacing your own work of trying to do it yourself in this old software. But you still use other aspects of it, or other software to combine [with] it. It really depends on the type of content you have and the level of outcomes that you that you need. But we do have folks that use it as an end-to-end content creation and editing tool. Lukas: Cool. Well, I mean the extent of my video editing is basically modifying videos of my daughter to take out the boring parts and send them to my parents. That's as far as I go. Maybe you could sort of give me a little bit of an overview of the cool stuff you can do with Runway. Cris: Totally. You can do all of that in Runway on the browser which is...you might be...you might start using Runway for that. The one thing I would emphasize is, everything is running on the cloud, on the web. You can just open any project with a URL. You can also create teams, and you have this baseline collaboration aspect that just runs out-of-the-box. Cool. Anything else? No, just go demo? Lukas: Yeah, let's see a demo. Totally, yeah. Show me the cool stuff. Cris: Perfect. So, this is what Runway looks like. If you're ever edited video before, it's a very common interface. We have tracks on the bottom. We have a multi-editing system with audio tracks, and keyframe animations, and text layers, and image support. You can preview your assets on the main window and have a bunch of effects and filters on the right. Again, everything running pretty much on the cloud in real time. The idea here is that there are a lot of things that you can do that are very similar to stuff that you can do in other applications, plus there are things that you can't do anywhere else. Let me give you an example of something that a lot of folks are using Runway for. I'm going to start with a fresh composition here. I'm going to click one of the demo assets that I have here. I'm going to click this. I have a surfer, right? On that shot, let's say I want to apply some sort of effect or transformation to the background of this shot. Or I want to maybe replace the person here and take it somewhere else. The way we do that today would be a combination of frame-by-frame editing, where you're basically segmenting and creating an outline of your subject, and every single frame you move you have to do it one more time. For that, we built our video object segmentation model — which we actually published a blog post and a paper around it — that allows you to do real-time video segmentation. In film, this is actually called rotoscoping. You can just literally go here, guide the model with some sort of input reference. I tell the model this is what I want to rotoscope, and it can go as deep as I need. I can select the whole surf layer here at deeper...more control over it. Once the model has a good understanding of what you want to do, it would propagate that single keyframe or single layer to all the frames of video in real time. You get a pretty smooth, consistent segmentation mask that you can either export as a single layer, or export as a PNG layer, or you can use...go back to your editing timeline and start modifying. You said you want to cut it, you want to compose it, you want to do some sort of transformation...from here, you can do that directly from here. Let's say I have my baseline — or my base video — here, I have my mask on top of that, and now I can just literally move it around like this. I have two layers, right, with a surfer. So, something that looks very simple and in traditional software may take you a couple of hours of work, here you can do pretty much in real time. Again, it's something that most editors know how to do, but it just takes them a lot of time to actually do. Lukas: And did you just run that in the browser? Cris: Yeah. Lukas: That segmentation mask, it figured out in the browser and it's calculating all...it doesn't go to the server? Cris: No, it goes to the server. Yeah, there's an inference pipeline that we built that processes real-time videos and allows you to do those things. The compute part is everything running on the cloud. You just see the previews and sometimes — depending on your connection — you can see a downsampled version of it, so it runs really smoothly and plays really nicely. Also, for every single video there's a few layers that we run, that help either guide something like a segmentation mask. For instance, we get depth maps and we estimate depth maps for every single video layer. You can also export these depth maps as independent layers and use them for specific workflows. That's also something very useful for folks to leverage. So you have this and you can export this. Behind the scenes, we're using this for a bunch of things. Lukas: Cool. Cris: Those are one of the things that you can do. You can go very complex on stuff. Let's say, instead of the surfer, I just want the — let me refresh this — I just want the background. I don't want the surfer. I can inpaint or remove that surfer from the shot. So I'm just gonna paint over it. Again, I'm giving model one single keyframe layer, and the model is able to propagate those consistently for the entirety of the video. That's also something we — as a product philosophy — really want to think about. Which is, you need to have some layer of control of input. The hard part of that should just be handled by the model itself, but there's always some level of human-in-the-loop process, where you're guiding the model. You're telling it, ""Hey, this is what I want to remove. Just go ahead and do the hard work of actually doing that for the whole video sequence."" Lukas: Wow, that's really amazing. That's like magic, right there. The surfer’s really just gone. Cris: Yeah. That's something we see a lot, when people find out about it, or when they start using it. ""Magic"" is a word we hear a lot. It's something that...again, if you're editing or you've worked in film or content before, you know how hard, and time-consuming, just painful it is. Just seeing it work so instantaneously really triggers that idea of magic in everyone's minds. Which is something for...that's great, because we've really thought of the product as something very magical to use. So, there's stuff like that. There are a few things like green screen and inpainting — which I'm showing you now — plus motion tracking, that we consider as baseline models in a Runway. Those are just...you can use them as unique tools, as I'm showing you right now. You can also combine them to create all sorts of interesting workflows and dynamics. There's the idea of, ""You want to transform or generate this video, and take this surfer into another location,"" you can actually generate the background, and have the camera track the position of the object in real time, and then apply the background that you just generated in a consistent manner, so everything looks really smooth. The way you do that is by combining all of these models in real time, behind the scenes. You might have seen some of those demos on Twitter, which we've been announcing and releasing. This is a demo of running a few of those underlying models, combined. There's a segmentation model that's rotoscoping the tennis player in real time. There's a motion-tracking model that's tracking the camera movement, and then there's an image-generation model behind the scenes that is generating the image in real time. Those are all composed at the same time. Does that make sense? Lukas: Yeah, yeah. Totally. Cris: Those are, I would say, underlying baseline models and then you can combine them in all sorts of interesting and different ways. Lukas: Totally. Alright, well, thanks for the demo. That was so cool. We'll switch to the interview format. Although now I really want to modify this video in all kinds of crazy ways. Cris: We should replace the background with some stuff while we're talking Lukas: Totally. Get this microphone out. One question I really wanted to ask you is, I think your background is actually not in machine learning originally, right? I always think it's really interesting how people enter the machine learning space. I'd just love to hear your story, a little bit, of how you ended up running this super cool machine learning company. It seems you're very technically deep, also. And so how you managed to get that depth mid-career. Cris: Totally. Long story short, I'm originally from Chile. I studied econ in Chile and I was working on something completely unrelated. But it was 2016 or 2017, I think, and I just randomly fell into a rabbit hole of ML- and AI-generated art. It was very early days of Deep Dream and ConvNets and AlexNet, and people were trying to make sense of how to use this new stuff in the context of art making. There were some people like Mike Tyka, and Mario Klingemann, and Gene Kogan who were posting these very mind-blowing demos. That now feel things that you can run on your iPhone on real time. But around that time it was someone...I remember Kyle McDonald — which is an artist — who was walking around with his laptop, just showing people a livestream of a camera. You had basically...I think with an ImageNet model running in real time, and just describing what it saw. And it just blew my mind. Again, it's 2016. Now it's pretty obvious, but around that time it was pretty special. I just went into a rabbit hole of that for too long. It was too much, I was just fascinated by it. I actually decided to quit my job, I decided to leave everything I had. I got a scholarship to study at NYU and just spent two years just really going very deep into this. Specifically in the context of, I would say, creativity. My area of interest was the idea of computational creativity. How do you use technology? How do you use deep learning or ML for really creative tool-making and art-making? That two-year-long research process and exploration ended up with Runway. Runway was my thesis at school. It was a very different version of what you see now. But the main idea was very much pretty much the same. It's like, ""Hey, ML and AI are basically a new compute platform. They offer new ways of either manipulating or creating content. And so there needs to be some sort of new tool-making suite that leverages all of this, and allows people to tap into those kinds of systems in a very accessible and easy way."" The first version of Runway was a layer of abstraction on top of Docker, where you could run different algorithms and different models in real time on this Electron app. You could click and run models in real time and connect those models via either sockets, or UDP, or a web server to Unity or Photoshop. We started building all these plugins where you can do the stuff that you are able to see now on Twitter. Like, ""Here, I built a Photoshop or Figma plugin that does image generation."" We were building all that stuff running Docker models in your computer locally, and you can stream those. It was 2018, 2019. Lukas: Interesting. It must have been a much more technical audience at the time then, right? If you have to run Docker on your local machine. That's not something everyone can do, right? Cris: Totally, totally. I think that that also tells a lot about how much progress the field has made, and how mainstream and how more accessible things have become. Trying to put this set of new platforms and compute ideas for creators, and video makers, and filmmakers required you to know how to install CUDA and manage cuDNN. I don't know if it's just too much. But people were still wanting to do it. There were some folks who were like, ""Hey, this is really unique. I want to understand how to use this."" But then we realized it wasn't enough. You need to go [to] higher layers of abstraction on top of that to really enable creative folks to play with this, without having to spend months trying to set up their GPU machines. Runway has really evolved, and we have a really experiment-driven thesis and way of working on the product. But it's all about trying ideas and testing them out with people really fast. We're building something that hasn't been done before. And so it's really easy to get sidetracked into things that you think are going to work, or ideas that you think are going to be impactful. But since you're working with new stuff all the time, being close to your user base for us has been kind of really, really important. Every time we iterate on the product, I think one consistent line of evolution has been this idea of simplifying...making higher abstraction layers on top of it. The first versions of rotoscoping or inpainting required you to select the underlying model architecture, and understanding what a mask was, and [how] propagation works. If you're really a filmmaker, you don't care about any of the stuff. You just want to kick once, and you want to get a really good result. For us, it's ""How do you build from there, using what we're building behind the scenes?"" Lukas: Were you surprised how well these approaches have worked to generate images? It sounds you started your work in 2017, 2018. The space has changed so much. Do you feel you saw it coming, or have things unfolded differently than you thought? Cris: I mean, things have definitely accelerated. But I think our thesis — when we started Runway three and a half years ago — was pretty much the same. It was, we're entering literally a new paradigm of computation and content. We're not going to be...we're soon going to be able to generate every single piece of content and multimedia content that we see online. I've been demo-ing generating models for creative use cases for the last three years. What I was showing three years ago, people were like...it was like, ""Hey, this is how it works. This is how you train a model. This is what the outcome of the model is."" Of course, at that time, it was a blurry 100x100 pixels image. Some sort of representation of what you were describing. Most people took it as a joke, like, ""Oh yeah, cool. Very cool. Cool thing."" Or as a toy, like, ""That's a fun thing, right? You kind of use it once. But of course, I will never use this in production."" I remember speaking with this huge...one of the biggest ad agencies in the world, and I was presenting to other executives. Here's the future of content, type anything you want. And something blurry came out and they're like, ""Cool, not for now."" And they reached three weeks ago being like, ""Hey, how many licenses can we get for this, tomorrow?"" Because the models are going just so much better, that it's obvious. It's transforming their industries and a lot other things. I think what has changed for us is pretty much the speed. Now we're entering a really nice moment where things are converging, and there's a good understanding of what's going to be possible, and where things are going. Scaling laws are getting to a good point. And so continuing the same, but the thesis of the company was always built on that this will happen, and it's happening sooner rather than later. Lukas: Do you have a perspective on if this acceleration will continue, or if we just are seeing a breakthrough, and then we're going to need new breakthroughs to get to the next level of quality? Cris: Sure. I think there's definitely more compute that needs to be added to this, more data sets. I think we're still scratching the surface of what it will become. There's still this...I was discussing this with a friend the other day, this idea of a curiosity phase where people are entering the realm of what's possible and coming up with all these solutions and ideas, but there's still a difference between those concepts, and explorations, and ideas and meaningful products that are long-term built upon those. What I'm interested in seeing is how much of those ideas will actually convert over time, over meaningful products. I think that conversion of products is not just pure research or pure new models, there needs to be a layer of infrastructure to support those things. It's great that you can run 1 single model to 1 single thing on X percent. But if you're trying to do that scale on a real-time basis for 10 people, that then use it on a team and depend on it for their work, then there's a slightly different thing. But I think we're about to see way more stuff around video, specifically. I think image might be solved in a couple of more months and video is starting to now catch up with that. It's a really exciting time for that. Lukas: What does something being solved mean to you? Like, you could just get any image that you would ever want or imagine? Cris: Yeah, that's a good one. That's a good question. I would say that I would consider being solved [as] being able to translate something like words or a description into a meaningful image or content that pretty much matches where you're trying to...what you're imagining. And if it doesn't, you're able to control really quickly and easily to get to the point where you can arrive at your final idea. That's why the combination of models really makes sense. It's going to be hard to have a full model that does exactly what you want. For instance, for image generation. I think it's a combination of, you have a model that does the first model, which is you generate something. There's no pixels, you generate the pixels. Second step is, you're able to quickly modify it, or inpainting, or grade it in some way, and start it in some other way. But that whole thing just happens in a few seconds or a few minutes, right? If you speak with anyone in the industry, VFX, or ad agencies or content creation, post-production companies, these are stuff these guys do all the time. This is what they do for a living, right? They're able to create content out of nothing. The thing is just it's really expensive. It's really, really expensive. And it involves a lot of time and rendering and skilled people to get to that point. I think for me, ""solved"" is, anyone can have access to that professional-level grade VFX-type of content from their computers and from a browser. Lukas: Do you ever think about making a version of Photoshop, instead of a video editing software? If you think images are closer to being solved. Certainly I can't go into Photoshop and get exactly the image I want. I love to play with all the image generation tools out there. But I do think they're amazing at first, but then you kind of hit this point where if you really want the image to look like you want, it gets kind of frustrating. It seems there's also room for an image version of what you're doing. Is that something you'd consider doing? Or, why not make that? Cris: Totally. Yeah. The answer is absolutely. I think, a few things. One, I think we're converging more to this idea of multi-modal systems where you can transfer between images, and videos, and audio. I think the idea that we've been...we built software to deal with each media independently. There's audio editing software, and video editing software, and image editing software, and text-based...you have models that can quickly translate between all of those. Content — let's say video — it's a combination of different things. You have images, you have videos, you have audio, you have voice. All of those things are now possible. I think for us, when I think about the product philosophy of Runway, it's less about, ""How do you build a better Photoshop or a better Premiere?"" Fundamentally, these models are just allowing you to do the things that none of those others can do. If you think about marginal integrations of those things...yeah, you build a better Photoshop that has a better paintbrush, or a better contact server tool. But ultimately, when you combine them in new ways, you create a new thing. It's completely new. It's not Photoshop, it's just a new way of making videos, and editing images, and editing audio. All in one, single component or tool. For me, what's really interesting is the multi-modal aspect of things, and translating also into those. And 3D, for instance, it's one of the filters...you're going to start to see a lot of translation between images and videos on 3D. Lukas: Totally. So, I have to ask you your thoughts on deep fakes and things like that. I'm sure everyone asks you that, but I'm really curious what you think about that. Do you think that you would want to put in limitations into your software to not allow certain things? Do you think this is about to change the way we view videos, as this technology gets more standardized and available to everyone? Cris: For sure. As [with] every major technology breakthrough, there's always social concerns about how it might be misused or used in not the right, intended ways. It's a good exercise to look at history to see what has happened before. There's this really good YouTube video about Photoshop when it was first released, I would think about the early 90s. They were like...it's kind of a late night show, and they're discussing the ethical implications of manipulating images in magazines. And they're like, should we allow to manipulate images and put them in magazines? Half of the panel was like, ""No, we shouldn't."" It breaks the essence of what photography is, right? 20 years after that, it makes no sense to think about not doing something like that, right? There's always an adaptation process, I would say, where people need to...we need to collectively ask, ""Hey, how is it going to be used?"" But I think ultimately, you understand what the limitations are, and you also fine-tune your eyes and your understanding of the world to make sense of that thing. Now everyone knows that ""Photoshop"" is a verb that you can use to describe something that's manipulated. You do that same exercise, and you go back in time, and you see the same. When film just started to appear, there was this story, interesting story about...one of the first films that were made is a train arriving to a station. They were like, projecting that on a room. When people saw the train coming to a station, everyone ran away because they thought a train was coming to a station, literally. But then you make sense of it, and you're like, ""Yeah, this is not true. I understand that this is an actual representation of something."" Ultimately, I think with AI and with generated content, we'll enter a similar phase, where it's going to become commonplace and something people are familiar with. Of course, there's going to be misuses and bad uses. Of course, people can use Photoshop for all sort of evil ways. But the 99% of people are just like, their lives have been changed forever in a positive way because of this. Lukas: Interesting. Well, look, I'd love to hear more about your tech stack. This is a show for ML nerds of all types. I think you're doing pretty hardcore ML at scale. What have been the challenges of making this work, making the interface as responsive as it was? What were the key things to scale up your models? Cris: Sure. There's a lot of things that we had to kind of come up [with] creatively, to make this work in real time. On the one hand — on the ML side — we mostly use PyTorch for all of our models. We have a cluster — basically, an AWS cluster — that scales based on compute and demand, where we're running all those models for training. We use sometimes Lighting and, of course, Weights & Biases to follow up and understand better what's working in our model training. Serving, we optimize for different GPU levels or compute platforms, depending on availability. We've made some systems to scale up depending on demand. On the frontend side of things, everything's Typescript and React-based. There are some WebGL acceleration stuff we're doing to make things really smooth. And then the inference pipeline, where we're writing everything in C++ to make it super, super efficient and fast, specifically since you're decoding and encoding videos in real time. We also built this streaming system that passes frames or video frames through different models to do the things that I just showed you. And so we also had to come up creatively with that. That's kind of a big picture of our tech stack. Lukas: One challenge that I'm seeing some of our customers run into — as these models kind of get bigger and more important — is that the actual serving cost of the application increases. Is that an issue for you? Do you do things like quantization? Is lowering your inference costs an important project for you all? Cris: For sure. Yeah, for sure. I mean, we're running...our biggest cost right now is AWS, GPU costs, and inference costs, and serving these models. There are two main areas for sure. We have an HPC, we're doing large-scale training of language models and video models. That takes a lot of resources and time. But just serving on...I would say the tradeoff between precision and speed really matters. Quantizing models is great. But also you need to make sure that you're not affecting the quality of the model because if you're affecting something on a pixel level, it might change the result from being okay to bad. And that might mean user churning. And so, if you're going to spend a few more seconds rendering, that might actually be better. There's always a tradeoff of how much. But yeah, we always try to figure out what's the right balance there. We're still exploring some stuff on the browser. I think the browser is becoming really powerful. The only constraint about the browser is just memory and RAM. And you get...it's a sandbox, so you can't really do a lot of things specifically with video. But you can run some stuff on the browser. And so we would send some things specifically, and convert some things, and make them smooth enough. But I think we're not 100% there yet. Lukas: But you're also training your own large language models and large image models. That sounds like training would be a major cost for you as well. Cris: Yeah, for sure. Retraining some stuff to make sure it works in the domain of what we have is one of our core competences. Now we're training...starting a huge job on our HPC. That's going to take a big percentage of our costs for the next few months. Lukas: Wow. I have to ask. That language interface that you showed me was so compelling and cool. But I have been seeing language interfaces for the past 20 years, and the challenge with these language interfaces is when they don't work, they're just enraging. Actually, you sort of addressed that. Showing how it creates these things, and you can undo them, and you can kind of modify them. Do you feel that that kind of conversational interface is at the point where, for you, it's an interface that you really want to use? Cris: I like to think [of] it as a tool. It's not the sole answer to everything you need. This is not going to be a replacement for all of the workflows in making content, video, images, or sound, or whatever it is. It's just a speed up in the way you can do those kind of things. I think the sweet spot is a combination of both. Being able to have that constant feedback loop with the system, where you're stating something out [and] the system is reacting in some way that matches your idea. And then you have that level of control so you're going the direction you want and doing what you want. Or, if it's not working, you just do it yourself, right? I think a big mistake of research — specifically in the area of computer creativity — is this idea that you're going to automate it entirely. You see one-click off solutions to do X, Y, or Z. I think that's missing the bigger picture of how most creative workflows actually work. That probably means that you've never actually worked with an agency where the client was asking you to change things every single hour, or make it bigger, make it smaller, right? Lukas: Right. Cris: It's hard for me to imagine a world where you have a one-click off solution for everything. That feels boring, to be honest. You want to have that control. I think language interfaces are a huge step towards accelerating the speed at which you can execute. Are they the final answer for everything? I'm not sure, but they do make you move faster on your ideas. Lukas: Did I understand you right that you want to build your own large language model? I would assume you would take one of the many off-the-shelf language models today. Are you actually training your own? Cris: Yeah, I think it's...we are, but it's also the fact that ML...the infra for models and models themselves are becoming commodities. It's great for companies like us, because some stuff we kind of need to build on our own. There's a lot of things in Runway that you won't find anywhere else. But there's a lot of stuff, large language models that you can just use off the shelf. You have all these companies offering similar services. It's a great...as a consumer of those, if we want to use those, it's just a cost situation where whoever offers the best model, we'll use. And to a point, it might make sense to do our own. So yeah, sometimes we don't have to do everything ourselves. You can just buy it off the shelf. But some other times, you just need to do it because it doesn't exist. Lukas: Sorry, large language models you think you might do it yourself, even? Cris: We're doing a combination of both. We're using APIs but also re-training some of our own. Lukas: I see, I see. Have you experimented with all the large models out there? Do you do you have a favorite of the existing offerings? Cris: I think GPT-3 works. I think, actually, the model is Davinci. It's probably GPT-4 by now. I think OpenAI has been making- -right, right. Cris: -that silently behind the scenes, it works really well. That's the one I'd say we're experimenting with the most, and we get the best results. Lukas: Cool. Well, look, we always end with two questions. I want to make sure I get them in. The second-to-last question is, what is a topic that you don't get to work on, that you wish you had more time to work on? Or, what's something that's sort of underrated for you in machine learning right now? I realize it's a funny question to ask an obsessed ML founder. But I’ll ask it anyway. Cris: I think, audio generation. I think it's catching up now, but it's not...no one really has been paying a lot of attention. There's some really interesting open source models from Tacotron to a few things out there. I think that's going to be really, really transformative for a bunch of applications. We're already kind of stepping into some stuff there. But, it's hard to focus as an industry — or as a research community — in a lot of things at the same time. And now that image understanding has kind of been solved away, people are moving to other specific fields. I think one of the ones that are going to start seeing very soon is audio generation. So yeah, excited for that for sure. Lukas: Yeah, I totally agree. Do you have a favorite model out there? We just recently talked to Dance Diffusion, or HarmonAI, that was doing some cool audio generation stuff. Cris: Yeah, there's one — let me search for it — that just blew my mind. tortoise-tts, I don't know if you've seen that one. Lukas: No. Cris: Yeah. tortoise-tts is, I think, the work of just one single folk, James Betker. It works really well and he's been...someone used it to create the Lex Fridman...generative podcast. I'll share with you the audio. It's a whole podcast series that goes every week, where everything is generated. The script is generated by GPT-3 and the audio is generated by tortoise. And you can hear it's like, it's a podcast. You can't really tell. Yeah, really excited for stuff like that. Lukas: Cool. The final question is for you, what's been the hardest part about getting the actual ML to work in the real world? Going from these ideas of models or research to deployed and working for users. Cris: I think these models — and things like image generation and video generation — require a different mental model of how you can leverage this in creative ways. I think a big mistake has been to try to use existing principles of image or video generation and patch them with this stuff. I think, ultimately, you need to think about it in very different ways. Navigating a latent space is not the same as editing an image, right? What are the metaphors and the abstractions they need to have? We've come up with those before, in the software pipeline that we have right now. You have a brush, and a paint bucket, and a context or world tool, and you're editing stuff. But when you have large language models that are able to translate ideas into content, and you navigate and move across specific space or vector direction in ways you want, you need new metaphors and you need new abstractions. What's been really interesting and challenging is, what are those metaphors? What are those interfaces? How do you make sure the systems you're building are really expressive? I think two things that drive a lot of what we do are control and expressiveness. ""Control"" as in you, as a creator, want to have full control over your making. That's really important. How do you make it, so you also are expressive? You can move in specific ways as you are intending to do. So yeah, that's also really...it's really exciting and passionate for us to invent some of those stuff. Lukas: Well, it’s really impressive what you did. Thanks so much for the interview. Cris: Of course, thanks so much for hosting me. Lukas: If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So check it out.",6898 +Jeremy Howard — The Simple but Profound Insight Behind Diffusion,https://www.youtube.com/watch?v=HhGOGuJY1Wk,4377,2023-01-05,"Jeremy: I’ve been telling everybody who will listen that I feel like we’re in the middle of a significant spike in technological capability right now. And so if you’re not doing that, you’re missing out on being at the forefront of something that’s substantially changing what humans are able to do. Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Jeremy Howard is the founding researcher at fast.ai, which is a research institute dedicated to making deep learning more accessible. They make an incredible Python repository that people use for lots and lots of deep learning projects. And they make an incredible set of classes that many people I know have taken, and is almost universally loved. He was also the CEO and founder of Enlitic, the president of Kaggle, and has done a whole bunch of diverse, amazing things in his career. It's always super inspiring to talk to Jeremy and this interview is no different. I really hope you enjoy it. Lukas: You are the first person to be on this podcast two times. And I think you are the most popular guest that we've had, based on our YouTube metrics. So it's great to have you. I wanted to start with, actually...the most memorable part of our interview — for me personally — was the amount of time that you set aside every day to work on just learning. Undirected, sort of learning new things, which I really thought was an amazing thing that I always aspire to do more of. I was curious. Lately, what have you been learning? Jeremy: I'm spending all my spare time at the moment on generative modeling, around the Stable Diffusion or diffusion modeling space. Lukas: Hence the new course, I guess. Is that part of the learning process? Jeremy: Yeah. It’s a chicken and the egg thing. It's partly ""the new course is because of the learning"", and partly ""the learning is because of the new course"". I've been telling everybody who will listen that I like feel we're in the middle of a significant spike in technological capability right now. And so if you're not doing that, you're missing out on being at the forefront of something that's substantially changing what humans are able to do. When there's such a technological shift, it creates all kinds of opportunities for startups, and for scientific progress, and also opportunities to screw up society. Which hopefully you can figure out how to avoid, and stuff like that. I'm very keen to do what I can to be on the forefront of that, and to help others who are interested in doing the same thing. Lukas: When you say ""spike"", do you mean diffusion models specifically or do you mean machine learning more broadly? Do you mean like- Jeremy: -I mean diffusion models, specifically. Lukas: Interesting, interesting. Jeremy: Yeah. It's a simple but profound insight. Which is that it's very difficult for a model to generate something creative, and aesthetic, and correct from nothing...or from nothing but a prompt to a question, or whatever. The profound insight is to say, ""Well, given that that's hard, why don't we not ask a model to do that directly? Why don't we train a model to do something a little bit better than nothing? And then make a model that — if we run it multiple times — takes a thing that's a little bit better than nothing, and makes that a little bit better still, and a little bit better still."" If you run the model multiple times, as long as it's capable of improving the previous output each time, then it's just a case of running it lots of times. And that's the insight behind diffusion models. As you'd be well aware, Lukas, it's not a new insight. It's the same basic insight that belongs to this class of models called ""boosted models"". Boosted models are when you train a model to fix a previous model, to find its errors and reduce them. We use lots of boosted models. Gradient boosting machines in particular are particularly popular, but any model can be turned into a boosted model by training it to fix the previous model's errors. But yeah, we haven't really done that in generative models before. And we now have a whole infrastructure for how to do it well. The interesting thing is that — having started to get deep into the area — I've realized we're not close at all to doing that in an optimal way. The fantastic results you're seeing at the moment are based on what, in a year's time, will be considered extremely primitive approaches. Lukas: Could you say a little more about that? Jeremy: Sure. Broadly speaking, we're looking to create a function that, if we apply it to an input, it returns a better version of that input. For example, if we try to create a picture that represents ""a cute photo of a teddy bear"", then we want a function that takes anything that's not yet ""a really great, cute photo of a teddy bear"" and makes it something a little bit more like ""a cute photo of a teddy bear"" than what it started with. And furthermore, that can take the output of a previous version of running this model and run it again to create something that's even more like ""a cute version of a teddy bear"". It's a little harder than it first sounds, because of this problem of out-of-distribution inputs. The thing is if the result of running the model once is something that does look a little bit more like a teddy bear, that output needs to be valid as input to running the model again. If it's not something the model's been trained to recognize, it's not going to do a good job. The tricky way that current approaches generally do that, is that they basically do the same thing that we taught in our 2018-2019 course, which is what we call ""crap-ification"". Which is, to take a perfectly good image and make it crappy. In the course, what we did was we added JPEG noise to it, and reduced its resolution, and scrolled[?] text over the top of it. The approach that's used today is actually much more rigorous, but in some ways less flexible. It's to sprinkle Gaussian noise all over it. Basically, add or subtract random numbers from every pixel. The key thing is then that one step of inference — making it slightly more like a cute teddy bear — is basically to ""Do your best to create a cute teddy bear, and then sprinkle a whole bunch of noise back onto the pixels, but a bit less noise than you had before."" That's, by definition, at least going to be pretty close to being in distribution, in the sense that you train a model that learns to take pictures which have varying amounts of noise sprinkled over them and to remove that noise. So you could just add a bit less noise, and then you run the model again, and add a bit of noise back — but a bit less noise — and then run the model again, and add a bit noise back — but a bit less noise — and so forth. It's really neat. But it's like...a lot of it's done this way because of theoretical convenience, I guess. It's worked really well because we can use that theoretical convenience to figure out what good hyperparameters are, and get a lot of the details working pretty well. But there's totally different ways you can do things. And you can see even in the last week there's been two very significant papers that have dramatically improved the state of the art. Both of which don't run the same model each time during this boosting phase, during this diffusion phase. They have different models for different amounts of noise, or there are some which will have super resolution stages. You're basically creating something small than making it bigger, and you have different models for those. Basically, what we're starting to see is that gradual move away from the stuff that's theoretically convenient to stuff that is more flexible, has more fiddly hyperparameters to tune. But then people are spending more time tuning those hyperparameters, creating a more complex mixture of experts or ensembles. I think there's going to be a lot more of that happening. And also, the biggest piece I think will be this whole question of, ""Well, how do we use them with humans in the loop most effectively?"" Because the purpose of these is to create stuff, and currently it's almost an accident that we can ask for a photo of a particular kind of thing, like a cute teddy bear. The models are trained with what's called ""conditioning"", where they're conditioned on these captions. But the captions are known to be wrong, because they come from the alt tags in HTML web pages, and those alt tags are very rarely accurate descriptions of pictures. So the whole thing...and then the way the conditioning is done has really got nothing to do with actually trying to create something that will respond to prompts. The prompts themselves are a bit of an accident, and the conditioning is kind of a bit of an accident. The fact that we can use prompts at all, it's a bit of an accident. As a result, it's a huge art right now to figure out like, ""trending on art station, 8k ultra realistic, portrait of Lukas Biewald looking thoughtful,"" or whatever. There's whole books of, ""Here's lots of prompts we tried, and here's what the outputs look like"". How do you customize that? Because, actually, you're trying to create a story book about Lukas Biewald's progress in creating a new startup, and you want to fit into this particular box here, and you want a picture of a robot in the background there. How do you get the same style, the same character content, the particular composition? It's all about this interaction between human and machine. There's so many things which we're just starting to understand how to do. And so, in the coming years I think it will turn into a powerful tool for computer-assisted human creativity, rather than what it is now, which is more of a, ""Hand something off to the machine and hope that it's useful."" Lukas: Do you think the same approach applies across domains? Or is there something about images — the way it's sort of obvious how to add noise — and maybe the data set that we have? I mean, certainly the way you described diffusion, there's a natural application to that to almost any domain, but- Jeremy: Correct. Lukas: -I guess Gaussian noise on text, it's a little unclear to me what that really means. Maybe it’s like... Jeremy: So, last week a paper showing diffusion for text came out. There's already diffusion models for proteins. There's already diffusion models for audio. The audio ones use — or some of them — use a fairly hacky obvious but neat approach of using diffusion to generate spectrograms — which are images — and then having something like a super resolution model. But it's not doing super resolution, it's doing spectrogram to sound. So yeah, these things are already starting to exist. They haven't had as much resources put into them yet, so they're still not that great. But yeah, that's the thing, Lukas, this is not just images at all. It'll be used in medicine, it'll be used in copywriting. The way we currently do generative text models, again, it's kind of a happy accident. When I did ULMFiT, the whole reason I created a language model was for the purpose of fine-tuning it to create a classifier. GPT then took that idea and scaled it up with Transformers. What Alec Radford was trying to do there was not ""generate text"", but try to solve other problems by fine-tuning it. There was this kind of discovery, almost, with GPT-3 that when you take this and you scale it far enough, it actually starts generating reasonable-sounding text. But the text is not necessarily correct. In fact, it's very often wildly incorrect. It'll...intentionally working on text generation approaches which are specifically designed for generating text is something that there's a lot of room to improve. Generally speaking, the way I see it is this. You've got a generative model that's trying to do something difficult and it's pretty good at it, or at least better than nothing. It'll be better at it if you can do it in a way that it runs multiple times during inference, because you're giving it more opportunities to do its thing. I think that means that these multi-step inference models — which may or may not be diffusion models, but kind of boosted generative models — are here to stay. Because no matter how good your generative model is, you can always make it better if you can find a way to run it multiple times. Lukas: I guess that is a good segue to another question I had, which is I think one of the really fun things about deep learning in the early days was it was so tangible. You have this fantastic class, where you can just kind of build these models and see how they work and play with them. I think we both have a very similar learning approach. But, one thing I've personally been struggling with, honestly, with these bigger models is just actually engaging with them in a meaningful way. It's fun to run the various image-generating models, but it feels kind of daunting. I'm not sure I have the money myself to buy the compute to make one that really works. We actually had one person on this podcast who did it for fun — Boris — which is a super fun episode, and I felt really jealous of how much fun he had building it. I'm curious how you turn that problem into something tractable, that you can actually engage with. Jeremy: Yeah. Well, Boris is one of our alumni. He's part of our fastai community, and he showed what is possible for a single, tenacious person to do. Lukas: Although I think Google donated like a hundred thousand dollars of compute to him. So it wasn't totally... Jeremy: Yeah, absolutely. If you can show that you're doing useful work, then there's plenty of compute out there which you can get donated to. But having said that, what he was largely trying to do — at least at the outset — was to replicate what OpenAI had done. I take a very different approach, which is I always assume that the best thing out there right now is far short of what the best thing could be. That in five to ten years time, there'll be something better, and I always look for improving that. So yeah, you should take our new course, Lukas- Lukas: I would love to. Jeremy: -which we're in the middle of, because what I've been working on is exactly what you describe. Which is, how to train and play with a state-of-the-art image-generative model in a notebook on a single GPU. As with all of these things, the trick is to start with an easier but equivalent problem. I'm doing all my work — just about — on the Fashion-MNIST dataset. Which, rather than being 512x512 pixel images of literally anything in the world, including artworks, in three channels, Fashion-MNIST is 28x28, single-channel images of 1 of 10 types of clothing. I always tell people — whether you're doing a Kaggle competition, or a project at work, or whatever — the most important two steps are to ""Create a rapid feedback loop where you can iterate and test fast"", and to ""Have a test which is highly correlated with the final thing you're going to be doing."" If you have those two things, you can quickly try lots of ideas, and see if they're probably going to work on the bigger dataset, or the harder problem, or whatever. It turns out Fashion-MNIST basically...I've kind of replicated a bunch of different approaches in the literature on Fashion-MNIST. The relative effectiveness of those different approaches on Fashion-MNIST mirrors basically exactly their relative effectiveness on COCO, or ImageNet, or LAION, or whatever. Lukas: Cool. Jeremy: But I can train a model on a single GPU to a point where I can see relative differences in about two minutes. Lukas: Wow. Jeremy: And that means I can very rapidly try things. I've started building notebooks where I show every single little step. And also, it helps a lot to use notebooks, which almost nobody working in the generative modeling field seems to be doing at the moment. What they do, is they have...the normal approach is to do ImageNet 64-pixel or CIFAR 32-pixel — which is still better than doing 512x512 LAION — but it still takes...ImageNet 64-pixel takes many hours on an 8-GPU machine. You can't do a fast iteration loop. In a notebook, I can run a single iteration of diffusion. I can see what the outputs look like because the pictures are all there in front of me. If you're not using this kind of approach, instead you're switching back and forth between a terminal, and then you need some way of actually viewing the images. And given that you're probably not sitting directly on that 8-GPU box, you're probably SSH-ing into it. So, now you've got to find a way to show those pictures. There are ways, by the way, of showing pictures in the terminal. For example, if you use iTerm2 there's something called imgcat. If you use other terminals, they probably support something called sixel, sixel graphics. But there's...they're not going to be as a good exploration environment for the kind of stuff than a notebook is. I think there's lots of opportunities for people like you and me to play in this field. I mean, I know there is because I've started spending time talking to some of the folks who were the primary researchers responsible for the key components of Stable Diffusion. And I'm already telling them things that they hadn't thought of before, by virtue of weird little experiments I've done with Fashion-MNIST on my single-GPU Jupyter Notebook. Lukas: Yeah, that makes sense. A fast feedback loop is so important. That's very cool. I was curious, broadly, if you have though on Stable Diffusion in general. We're sitting here in November 2022, and I think they've done an amazing job of bringing awareness to generative models. What do you think about Stable Diffusion? Jeremy: It's been great for progress in the field, clearly. Generally speaking, I'm all about democratization and accessibility, as you know. I don't love the fact that before Stable Diffusion was released, a small number of people in the world had access to the full generative models. And then other people could pay for cut-down versions of them, use them in small quantities. The thing is, accessing these things through a web-based API is extremely limiting. When you've actually got the weights, you can really play with both the engineering and the artistic side of doing things that no one's done before. So yeah, I think that's great. I think it's important. I think — as with any of these things — you release a new, powerful technology out there and a whole bunch of people are going to be using it for, you know, not necessarily the things that you would have chosen to use it for. For example, for Stable Diffusion, it seems like a very large percentage of people who are using it to generate lots and lots of images are doing it to generate anime and specifically nearly entirely...very young women with very few clothes on, anime pictures. I'm sure there are people out there who are taking the clothes off entirely. That happens, I guess, with any technology. I don't necessarily have...I mean, I guess you can't stop that happening. But we certainly need appropriate laws around at least making illegal things...make sure the things that we don't want to be legal, are in fact illegal. But yeah, there are obviously huge benefits. And you're not going to get stuff like protein diffusion models, or pharmaceutical diffusion models...none of those are going to develop if the technologies are in the hands of two or three big organizations. So it's certainly a very valuable step on the whole for society to have this stuff as open as possible. And to be clear, it was all trained at universities. The main one, most of the stuff we're using now for Stable Diffusion was trained in Germany, at German academic institutions, using donated hardware. Lukas: I guess it's interesting though that it was, I think, primarily ethics and AI considerations that made folks like OpenAI restrict access to their models. Or at least that's what they said. Do you think that you would know a priori that that was the wrong thing to do? Would you have pushed against that at the time? Jeremy: I actually wrote a blog post about that back when GPT-3 was just announced, and not released. Nearly universally, the feedback — at least from the AI community — was, ""Oh, this is lame. They're just doing it for profits."" In my blog post, I said, ""Well, not necessarily. There are genuine things to be thinking about here."" Which is not to say that that means that the motivation wasn't at least partially profit-driven. It might well have been. It's certainly convenient that the ethical considerations read in this way entirely align with profit-driven motives as well. But, like I say, it doesn't necessarily mean they're not true. And I'm pretty sure it's for both reasons. If you look at the way OpenAI has behaved since then, they've behaved in a way that is very increasingly apparently profit-driven. So, I'm less generous in my interpretation now than I was then, based on their continuing patterns of behavior. I think also with the benefit of hindsight, it feels a lot more like, in the last couple of years, companies keeping models to themselves, the main impact that ends up being is to create a bigger bifurcation between haves and have-nots in terms of capability. Requiring more researchers to pay for API access to do things, a decreased amount of openness, and in fact even what could be argued as being kind of deceitful behavior. For example, we now know that the OpenAI models that you can pay to access are actually not the same as what's been described in their research papers. We've now had dozens of people write research papers comparing various work to the OpenAI models, and now we've learned that actually we're not comparing to what we thought we were comparing at all. You know, thousands of hours of researcher time being wasted and papers being published with what turns out now to actually be totally wrong information. I'm definitely more enthusiastic about the idea of being open than perhaps...more confident about that than I was a couple of years ago. Lukas: Do you have thoughts on the language side of things, like large language models? Do you think that...for example, do you think that prompt engineering is headed to be an important way of doing machine learning? You do see these models doing incredibly well in a wide variety of NLP tasks. Better than models trained specifically on these specific tasks, sometimes. Jeremy: Yeah. I think generative text models have both more opportunities and more threats than generative image models, for sure. Like I say, they're kind of...the fact that they work at all is in some ways a bit of an accident. They're far, far, far from being optimized for purpose at the moment. But they're already amazingly good, particularly if you do this kind of stuff where literally there are now dozens of papers. ""Just look at what kind of prompts happened to work on these models that we kind of accidentally made generative models,"" ""let's think step-by-step"", and whatever else. We're starting to find ways to actually get them to do a little bit more of what we actually want them to do. But so far we're using really, really basic things. You know, all this ""instruction tuning"". So, rather than just feeding it the entire internet, let's actually fine-tune it with some examples of things that are actually correct info, that actually represent outputs that we would want for these inputs, rather than just whatever somebody rando wrote on the internet 25 years ago. My worry is...I'm much more worried about misuse of text models and image models, because it wouldn't be at all hard to create a million Twitter or Facebook or whatever accounts, and program them to work together to impact the world's discourse in very substantial ways over time. And nobody would know. We could have...on Twitter, for example, some fairly small number of accounts — often where nobody actually knows the human who's behind it — can have very substantive effects on what people are talking about, and how people talk about that thing. Imagine a million of those accounts, which were actually bots that had been trained to be more compelling than humans — which already for years, we've had bots which humans rank as more compelling than actual humans — and that they've been trained to work together. You know, ""Take alternate points of view in exactly the right way,"" and this bot gradually gets convinced by that bot, and whatever else. It could cause a very small number of people in the world to programmably decide how they want humanity to think about a topic, and pay to make that happen. Lukas: Although if I remember right, it seemed like all of fast.ai's sort of broad mandate was to basically make a no-code interface into machine learning, so anyone could access it. And it does sort of seem like prompt engineering — to the extent that it works — is like a huge step in that direction. Isn’t it? Jeremy: Right. Yeah, that's what I'm saying. That's why I said it's both got more opportunities and more threats. The opportunities are vast. Take, for example, the recent thing that was released last week or so, explainpaper.com. Where our students are already...so, with our course we look at a paper or two each week. Last week I had told the class, as homework to re-implement the diff edit paper. Students were saying like, ""Oh, I didn't understand this paragraph. So I highlighted it in explainpaper.com, and here's a summary it gave, and that's a lot more clear now. And then I tried to understand that bit, so I asked for more information."" This is very, very valuable. I saw somebody on Twitter a couple of days ago saying they don't really use Stack Overflow anymore, because they created this tiny little, simple little script called ""ask"" where they type ""ask"" and then something as a prompt — sorry, in the bash shell repl — and it would feed that off to OpenAI GPT-3, and return the result, and they basically use that instead of searching the internet nowadays. Lukas: Wow. Jeremy: Yeah. People are definitely using this stuff and it's going to get much, much better. Lukas: Do you have a clever way — like with Fashion-MNIST and image generation — to play with large language models on kind of a bite-sized scale? Jeremy: Not yet, no. I'll get to that, maybe, in another part of the course, I guess. It's definitely a great question and something to think about. Lukas: Interesting. Okay, a question that I need to revisit — because this is unexpectedly, I think, one of the reasons that so many people listened to my interview with you last time — you sort of made an interesting comment that you felt like Python wasn't the future of ML. You sort of said maybe Julia is the future of ML, and that really seemed to strike a chord with the internet everywhere. I think it's kind of the most-discussed part of Gradient Dissent of all time. So, I'm just curious. Do you have any more thoughts on that? Do you still believe that Julia is the future? You were sort of on the fence about that. Jeremy: I was on the fence about that last time we spoke and- Lukas: Totally. Jeremy: -I would say I'm a little less bullish than I was then. I feel like the Julia ecosystem and culture, it's so focused on these HPC, huge compute, running things on national lab machines. It's all stuff that's very appealing to engineers. It feels good, but it's such a tiny audience. I don't care about whether I can run something on 5,000 nodes. I just want to run it on my laptop. And it's still not great for running on my laptop, really. And it's not great for creating software that I can send you. I can't...if I created a little CLI tool or whatever, well, it's not great for creating little CLI tools cause it's so slow to start up. And then how the hell am I going to send it to you to try out? It'd be like, ""Okay, Lukas. Well, install the entirety of Julia, and then run the REPL, and then type this to go into package management mode."" And then, ""Okay, now you've got this thing and now you can run it."" It's like, okay, that's not going to happen. Or even just deploying a website, it's a lot of fuss and bother, and uses more resources than it should. It's still got that potential. But...I guess the other thing that's become more clear, though, in the last couple of years is their grand experiment on type dispatch...it is more challenging to get that all working properly than perhaps I had realized, because it's still not really quite well working properly. Good on them for trying to make it work properly. It's a vast research project. But there's a lot of weird little edge cases and trying to make that all run smoothly is incredibly challenging. I suspect...something needs to replace Python, but maybe it's something that doesn't exist yet. Partly though...what we're seeing instead...everybody knows we have to replace Python. So, what instead's been happening is we're using Python to create non-Python artifacts. Most obviously JAX. JAX uses Python — or a subset of Python — with a kind of a embedded DSL written as a library. Which only lets you create things that can be expressible as XLA programs, and then XLA compiles that to run fast on a TPU That works pretty well. It's very challenging, though, for research, or hacking, or learning, or whatever, because it's actually not Python that's running at all. So it's extremely difficult to profile — and debug, and so forth — that code. Very hard to run it really nicely in notebooks. In our little team working on diffusion models, we kind of all want to use JAX. But every time we try, it's always...because like everything I write is always wrong the first 14 times. And with Python, you know, I have 14 goes at making it better by finding all the stupid things I did. By running one line at a time, and checking things, and looking at pictures. With JAX, I wouldn't know how to fix my broken code, really. It's difficult. Lukas: But you don't think that that flexibility is fundamentally in conflict with making a language performant? I think we covered this last time. Jeremy: It is for Python. It is for Python, I think. For Python, that flexibility is to be able to actually run it as Python code. If you look at where PyTorch is going now, they've got this TorchDynamo stuff where they're working...they basically can interface with nvFuser, and you can interface with Triton, the OpenAI compiler-ish thing. I'm not sure exactly sure what you'd call it. Clearly PyTorch is heading the same direction as JAX. Which is, if you want it to run fast, you'll use TorchDynamo, or whatever it ends up being called. That's actually now integrated into the PyTorch tree. That's clearly where we're heading. And again, you end up with...probably you'll be using Triton. So you end up...Triton's amazing. Super cool, super fantastic. But you still end up with this thing that's running compiled code. It's not the same code you wrote, but a version of it. More difficult to hack on. If you look at how this works, there's a whole world of software that's written in languages which were explicitly designed to work this way. They're compiled languages. Languages like C++, and Swift, and Rust. They have something very nice, which is they have flags you can pass the compiler. You can pass that the -d flag to run it in the debugger, or you can pass the -o flag to run the optimized version. Basically, you get to choose how close the code that's actually running is to the actual lines of code that you wrote. So that for debugging, you can actually...it'll run slower, but it's actually running the lines of code that you wrote. And I think we want something like that, something that, ""Yeah, it looks like Python. It's pretty compatible with Python. You can still run it as Python, but you can also run it in an optimized way."" Maybe something that actually takes better advantage of these kind of type hints that we can provide. That's my guess. What's going to happen is we'll see Python-esque languages...we'll continue to see these Python-esque languages appear, that may begin to look less and less like pure Python, and are designed to work better and better with these backend linear algebra accelerators and compilers. Lukas: Is there some language out there right now that that has that feel for you? Jeremy: No, they're all basically these embedded DSLs. Like TVM or like Halide. We have the MLIR project, which is kind of providing the backend needed for these kinds of things. Chris Lattner has a new company, which presumably is going to be placed better than any other to create what we need for this kind of thing. He's the guy behind MLIR. It feels like a big open area to me, at the moment. Lukas: Interesting. Okay, on a totally different topic — that I kind of can't believe we didn't cover last time, I feel like we must have been right in the middle of it — I think I, along with many other people in the world, watched you advocate for wearing masks in the early days of COVID. I think you had some of the most high-profile articles on this — like the second-most popular on Preprints — and I was just kind of curious if you could sort of tell that story from your perspective. And maybe what you were seeing that other people were missing, and how you were kind of approaching that problem differently. Jeremy: It's hard for me, Lukas, because I don't understand why — and I still don't understand why — it's not reasonably obvious to everybody. Like, what's everybody else missing and why? Because from my point of view...well, okay, let me go back. So, February 2020 — mid-ish February 2020, late February 2020 — I had a course coming up at the University of San Francisco that I was going to be teaching. I had heard increasing chatter about this Chinese virus thing. What then happened was it hit Italy, and there was a lot more information in English about what was happening in Italy, than there was what was happening in China. So it suddenly was much more accessible to see what was going on, particularly because a lot of the Italian doctors were actually on Twitter and stuff, so you could read what was happening. A whole bunch of people were saying like, ""This is a disaster"", ""The president of the Italian medical body just died of COVID,"" and, ""There's not enough hospital beds."" I knew it had kind of just started to get detected in New York. I thought, ""Oh, well, it seems like it might be quite likely to come here. What does that mean for our course?"" Not at all altruistic. Just, like, are we still going to do our course? My wife and I kind of started reading about it to try to figure out what should happen with the course. And as we did, we were...yeah it was very obvious that it was going to be a global pandemic and it was going to sweep through San Francisco within weeks. And so like within two days, I wrote an email to everybody who had registered to the course, and put out a blog post, and said we're not doing the course live. We're going to do it virtually. This is well before our university — or I think any university — had decided to do that. Which again, I already thought was weird. Like I thought, ""Okay, it's not yet here, but obviously it's going to be. So why are people acting as if it's not going to be?"" Rachel and I ended up writing a long blog post. We were kind of like, ""Okay, it's not just our course."" We've got all these friends in San Francisco who are doing things that we're pretty sure they're going to look back on in hindsight and think, ""That's a terrible idea, because I put myself and my community at risk."" So we said...we didn't know much about it, so we just said, ""Look, as data scientists, here's what we can see so far in the data. It does seem to grow exponentially, at least at first. And, you know, this is the impact it's been having in Lombardi. Here's the early impact in New York. Here's how the math of these kinds of things work. Here's not just a prediction, but an almost certainty as to what's going to happen here."" That got a lot of attention. We had no idea how to avoid it ourselves. We were worried that...historically, when there is global pandemics, it can lead to violence. It can lead to societal disharmony, or whatever. We decided to get out of San Francisco for a while. We also...it was clear that there was going to be a lockdown at some point because, I mean, why wouldn't there be? Again, none of our friends seemed to believe any of this is going to happen. It's really...I thought it was weird, it just seemed very obvious. And then yeah, there was a lockdown like a week or two later. We had told our daughter's school, ""Oh, there's probably going to be a lockdown."" They sent back this rather annoyed email about interrupting learning or something. The schools were closed for a year in the end, in San Francisco. Then we were like, ""How do we not get COVID?"" Because we probably don't want to get COVID, because it seems like getting COVID can be bad. We started to hear from people who would like...saying maybe there could be longer-term implications of some of these kinds of SARS viruses. So I started looking into how it spread, and I discovered that there's all these countries around China that had avoided getting hit by COVID. Particularly Hong Kong, that's literally a train line away from Wuhan. And that just seemed amazing, you know. That's when I discovered that Mongolia, Taiwan, and Hong Kong all had this either universal mask policy or universal mask usage, kind of culturally. And I thought, ""Oh, that's weird."" Because I thought masks were this kind of weird thing. For some reason, you go to Chinatown, you see people wearing masks and that's how it's is, and that's weird. I didn't give much notice of it. But then I started learning it was this respiratory infection, and it kind of started to make sense. I wrote something in the Washington Post talking about how in the Czech Republic, particularly, the populace had independently decided to wear masks, heavily driven by a popular science YouTuber. Basically, within like three or four days, the whole country had made enough masks for everybody, and their president was talking about how proud he was. Again, their infection was going the opposite direction to other countries, I thought that was interesting. So yeah, I kind of wrote an article about that. I talked to a guy who used to be very high up in the government on the science policy side, and I asked him what's going on with masks. He said like, ""Well, nobody thinks there's very convincing science about it."" He said if you want to convince people to wear masks, then you need to find some better science. So I contacted basically the 18 smartest scientific researchers I knew, everybody from Lex Fridman to Zeynep Tufekci and said — not just scientific researchers, in Zeynep's case a sociological researcher — and said like, ""Do you want to help me put together the evidence?"" That's where our paper came from. Basically, everybody said yes, they all agreed. Suddenly we had this huge author group, so we kind of set up a Slack channel. None of us had a really strong opinion going in. Had one of the world's best aerosol scientists, he was probably the strongest opinion going in because this is his job. He was like, ""Well, let me explain aerosols to you."" Then what happened was there was this amazing couple of papers that actually used this laser-scattering light chamber thing to actually literally take videos of respiratory particles suspended in the air. Not suspended, but they just float in the air. It showed that they float in the air for up to an hour. And it showed that when somebody wears a mask, they don't appear. That was the point where I went from ""curious and interested"" to ""100% convinced"". Because it'd be like if somebody said, ""I promise you, Lukas, if you throw this ball at that wall, it won't bounce off. It will go through."" You'd be like, ""Well, Jeremy, I'm not sure. But I'll give it a go."" And you throw the ball at the wall, and it bounces off, and you go like, ""Jeremy, I am very sure you're wrong about your theorem."" And that's how it was with masks. There were people who said masks don't provide respiratory protection from these airborne particles, and then here's a video of them not going through the mask. I was like, ""Okay, that's...I don't need any RCTs. There's a video. There's a picture of it working."" I kind of went all in on just trying to say to people, ""No, there's actually a thing that stops the thing that infects us. So we should wear them."" I found it extraordinarily bizarre that everybody didn't just go, ""Oh, look at that video of it working. Therefore, it works."" It was a super frustrating experience. I don't...there's nothing I enjoy about researching masks and there's nothing I enjoy about political advocacy. The former is boring and the latter is stressful. But when there's something that so obviously can save millions of lives — and also can avoid who knows what long-term harm — it just seems absolutely ethically required to act on that. I spoke with all kinds of world leaders, and politicians, and celebrities, and whatever. In every jurisdiction, it was like a whole new conversation. It was like, ""Talk to people in South Africa; 'Oh, we don't believe in masks.'"" It was like, ""Talk to people in London; 'we don't believe in masks'. Talk to people in Australia; 'we don't believe in masks'. Talk to people in Florida; 'we don't believe in masks.'"" Each one, I discovered this horrible thing. Which is everybody decided they didn't believe in masks until their personal jurisdiction got hit hard by COVID. Wntil the hospital started filling up. And then they would get back to me and say like, ""Oh, tell me more about this mask thing, Jeremy."" That was infuriating because of course the answer is, ""Well, if you had put in mask mandates two months ago, then this wouldn't have happened. Now it's too late because masks can reduce R by a bit, but not enough to reverse a full-on pandemic, once it's there."" Honestly, it...I got really burned out by the process. In some ways it was successful, but in the end, the pandemic still happened. And in the end, I'm still flabbergasted, particularly now that high-quality medical masks are widely available. Demand is so low that factories have been shutting down. I've never had COVID. Literally nobody I know who has worn a high-quality mask at all times indoors, none of them have got COVID. And everybody I know who doesn't, have all had COVID. There's a point at which you kind of say, ""Okay, I've done what I can. You do you."" Lukas: So you continue to wear a mask indoors, at all times? Jeremy: Of course. Yeah. Lukas: What would change...when would you stop wearing a mask indoors? Jeremy: I suspect it's the same as the answer to the question, ""When would I stop drinking clean water?"" I'd rather keep drinking clean water. We decided...I mean, remember, it took decades — even after the John Snow experiment — for big cities to decide to invest in clean water infrastructure. Presumably after some number of years, we will invest in clear air infrastructure. China's already done it. They now have, I believe, HEPA filters in pretty much all their public buildings, and they're putting in UV sterilization in pretty much all their public buildings. Hopefully, at some point, the West will do the same thing and then it'll be like, ""Okay, I'm in an environment with clean air,"" so I don't have to self-clean the air. That'd be one option. Another would be...again, China's ahead of us on this. They have nasal vaccines, which are probably much more effective. If we eventually get those, I think they can actually make a significant dent on transmission. The injected vaccines don't make much of a big impact on transmission. So yeah, there are technologies that should allow us to be able to be pretty safe in indoor spaces. Lukas: But you don't wear masks in an outdoor space? Is that the... Jeremy: No, I mean, it's not exactly a hard and fast rule. We went to a birthday party recently, for example, where it was a karaoke thing. It was outdoors, but all the kids were singing, and they were tightly packed, and whatever. So, our family wore a mask because there's a high amount of aerosolizing activities going on with a high density of people. But yeah, broadly speaking, I'm not too concerned about outdoors because the airborne particles disperse much more quickly. Lukas: I see. I guess the interesting thing about that story maybe is that there maybe was a fairly broad scientific consensus, but no one was really ready to advocate for it. Is that a better summary of what was happening? If you got all these scientists together and they actually all agreed with what you were saying... Jeremy: They didn't, unfortunately. What happened was it was highly polarized by areas. The people that actually understood this are the aerosol scientists. And the aerosol science community was basically 100% all on the same page. Like, ""Talking, breathing, these are aerosolizing activities. We have loads of evidence that this is transmitted through aerosols. We have loads of evidence that in the droplet nuclei — that are suspended in the air — masks block those from getting to your lungs."" All those were pretty much understood in that community. But then the challenge is, Lukas, that we haven't had a major respiratory pandemic in the West, really, since the Spanish flu. So, none of our infectious disease community has any background in that. I spent a lot of time advocating — including speaking directly to the WHO's infection control groups, the folks who kind of ran the response at the WHO — and they were overwhelmingly people who had a background in infectious diseases that was bred through contact. The kind of stuff that hand washing helps with. So they were just coming from a totally different direction, and had decades of experience on treating different kinds of diseases in a different way. They were doing their best to learn and understand. But for some, that was a very difficult experience. One in particular, John Conly, his financial stake was very high in this fomite transfer. That transmission is not through the air, but by contact, because he has financial interests in that being the case. So, very difficult for him to come to terms with the idea that this is a respiratory infection, through respiratory particles, requiring respiratory protection. That was a big challenge, this worldview difference between different scientific groups. The aerosol scientists, there were actually none of them on the WHO's infection protection committee...infection control, whatever it was. I noticed — when I was talking to WHO — it was a total lack of diversity. Every single one had the same kind of academic background, and the same way of thinking about things, and they all knew each other very well. They were also...being involved in the WHO is a very strong status signal in their career, so everybody wants to be invited to those kinds of things. And so you really want to have all the other people on the committee think you're a good, nice person. It creates this real monoculture. So that was another big part of the problem. It was all...it definitely made me a lot more cynical than I was before it, to see how the WHO works. And even our big paper, how to get it published. It took a year from being written to being published. By the time it was published, it was basically too late. The process of getting it published was much more about politics than about science, you know. It was disappointing for me to discover that systems that I had thought of as being very much focused on rationality and data and correctness and rigor...so much of it turned out to be about politics, and networks, and stuff. I guess I was probably pretty naive before all that happened. Lukas: My sense is that people broadly believe that masks reduce the spread of COVID at this point. I'm not sure that I know exactly to what degree...it sounds like you're saying to a really massive degree. But I think you had a part in that. Or maybe just...I just follow you on Twitter and we were just watching you talk about it. But I don't know. It does seem like it’s the mainstream... Jeremy: Yeah, I mean, I was leading the Masks4All group globally. We were the most substantive group doing that. Absolutely. Lukas: It feels like it was successful, though. I mean, I just...do you not... Jeremy: It was successful-ish. If you're in San Francisco, it'll look more successful than if you're in Australia, for example. In Australia...from time to time, we've had mask mandates and everybody wears them when they're told to. The rest of the time, it's strongly recommended, but nobody does. But in San Francisco, I'm told maybe 30% of kids at schools — or some schools — are wearing them. It's definitely...it's disappearing. And also people — a lot of people, maybe most people — I see wearing masks, at least in Australia, are wearing masks that don't work very well, even though the good masks are really easy to get. And a lot of people don't realize like if you get a high quality N95 respirator, you could wear that as many times as you like, until the straps wear out. A lot of people think, ""Oh, you can only wear it once."" A lot of people think it has to be fit-tested. A lot of people think it's like donning and doffing is some complicated thing. There's all this wrong information out there. And so the number of people actually wearing high-quality masks is...to me, it's surprisingly low. If everybody wore one whenever they were indoors, I think we might...particularly if we also had HEPA filters in indoor spaces, I suspect we would be done with a virus, that it would go away. Because how would a respiratory virus continue to transmit when you break the flow of respiratory particles? Yeah. I mean, even in China. All the pictures I see, everybody's wearing surgical masks. It's, like, weird to me. Lukas: Interesting. Well, look, we're almost out of a time and we always end with two questions. But you're a little bit of an unusual guest, I don't know exactly how all these will fit your worldview. We like to...I like to ask people, if you had some extra time to research something completely different, what might it be? I feel like you are just an unending font of this stuff. What are some things that you're interested in that you haven't had time to look into? Jeremy: Well, I'll answer a slightly different question because any time I'm interested in researching something, I just do. Lukas: Fair enough. Jeremy: The most recent thing I spent a lot of time researching is children's education. Our daughter missed the first year of school. Because of COVID, in San Francisco they were closed. That would have been her kind of transitional kindergarten year, as they call it in California. Then we came to Australia, and so she went to school — regular school — for the first year here. She was straight into grade one. She enjoyed it. She was always happy to go, and happy to stay there. But it felt like she had blossomed a lot more during her previous year when she was doing stuff over Zoom, and on apps, and stuff than the year that she was in-person in the classroom, which really surprised me. Instead, she had become much more of a perfectionist and was becoming much less resilient after her year at physical school. That all seemed really weird to me, because I thought that environment would be much more healthy than the previous one. I started investigating it really carefully and studying a lot of academic papers about education. I was stunned to discover that there's pretty broad consensus in parts of the academic community — or some very strong data — that suggests schools are not a particularly great place for most kids to really blossom, or at least entirely focus on school learning. In fact, tutoring...kids who get tutoring are in the very top, highest academic performers regardless of their previous background. It seems like all kids can be really successful given the right tutoring. Our daughter was doing all this stuff with apps, and on Zoom, and stuff during her first year. None of that is limited by the speed at which a teacher thinks a kid should go, but instead the computer is dynamically adjusting difficulty over time. So, weirdly enough, our daughter was basically at Grade 4 or Grade 5 of math after a few months of doing these apps. They're so much more effective than normal teaching. We were also trying to figure out, ""Well, how do you avoid her getting really bored and stuff?"" So I did this really deep dive into education and discovered there's all these fascinating, different ways of teaching and learning which are entirely different to what's done at normal schools. Eventually, we decided to take her out of school and instead switch to using these kind of more academically driven approaches in a homeschooling environment. Which also seemed to generally lead to better social outcomes, better mental outcomes — better mental health outcomes — and better learning outcomes. That's kind of been interesting to me, to discover this whole world of research that seems really important, for humanity. How kids should learn. It feels like, again, it's being largely ignored by the institutions that we send our kids to. Lukas: Let me just see if I got the summary of this: basically that tutors are much more effective than schools at actually teaching kids things. Is that what you’re saying? Jeremy: That would be part of it. But there's lots of...that's kind of one starting point. Yes, even kids that would otherwise have been doing pretty badly at school can be in the very top performers. That kind of is an existence proof, that pretty much all kids can be extremely successful. But then there's also this kind of interesting data point for us, which is when we gave our daughter an iPad, and some math and reading apps, and somebody on the other end of a Zoom to supervise them, she had a huge amount of fun and learned dramatically more quickly than I thought was possible. And then when she actually went to school, she basically learned nothing for the whole year and ended up becoming much less resilient. There are specific ways of learning that are not particularly compatible with the normal ways we teach at school. For example, we might have talked before about Anki and repetitive spaced learning. My daughter does Anki every day. Literally everything she learns, she will remember forever if she creates a card for it, or she decides she wants to know it. That's kind of quite difficult to do at a normal school because you'd need all of your grade levels to be doing Anki. So that in Grade 5, you've still got cards from Grade 1 or Grade 2 coming back. But what happens at school is each year...for example in Australia, the Year 7 and Year 8 math curriculums are nearly entirely a refresh of the primary school curriculum, because they kind of assume the kids are going to need to see it again, because they've probably forgotten a lot of it. Things like, ""How would you incorporate spaced repetitive learning?"" Some schools in England have tried to do something like that using something they call ""retrieval practice"". I know there's a school called the Michaela school, which I believe had the highest results academically in the whole country. They do something like this. There's a few...there's a handful of schools here and there which are trying to use these kind of research results. But they're kind of the odd ones out. Lukas: All right. Finally...I don't know if this one really applies to you. We usually ask — because my company, and this interview, is all about making machine learning really work in the real world — we usually ask like what's a hard part that you've encountered in taking something from research to actually working for some purpose? That may not exactly apply to you, but you seem very good at sort of interpreting my questions in a useful way. So I pose it in its most abstract form. Jeremy: I mean, I've had lots of projects that I've tried to bring into the real world. Lukas: Of course, that's right. Yeah. Jeremy: It's difficult. I've been doing machine learning projects for over 25 years now, believe it or not. In the early days, it was such a challenge because managers didn't believe in the power of data at all. When I would try to tell them that it could be really valuable, they would always say like, ""Can you point to a role model of a company that's been successful because of their use of data?"" And there were none. That was tough. Lukas: Yeah. Jeremy: Then Google came along, which was great, because then I could point at this one company that was really working hard to use data and they've become very valuable because of it. Nowadays that bit's a lot easier. But actually, unfortunately, my answer is going to be that I've kind of — for a lot of companies — I've given up on even trying. Because I tried to get...particularly when I was at Singularity University, where all of our students were basically execs from giant companies. We were trying to convince them to be more data-focused and some of them really took that on board. And then they would invite me to come and talk to their VP groups and exec groups. I saw lots of big companies try to get more data-driven, try to use machine learning. I didn't see any being successful. The issue seemed to be that their entire management teams were people who...that was not their area of expertise. They were not promoted because they were good at that. They would have very smart, data-driven people down in their kind of business analyst levels, that they would have no idea which ones knew what they were talking about, and have no way to kind of curate what they were being told. All of the promotion systems were based on experience, and credentialing, and things other than analytical capabilities. So, in those kinds of companies, I eventually decided, ""Okay, maybe it's not possible for a legacy company to become a data-driven company."" And so nowadays I focus all of my attention on startups created by founders that are already data-driven and have a good understanding of analysis. What we're seeing is, increasingly, the most valuable companies — or particularly the most valuable companies in America — they're basically all now ""tech startups"". I mean, they're not startups anymore, but they're all companies that are created by engineers and data-driven people. I think for data scientists interested in making an impact, the best thing to do would be to try and make sure you're at a company where that kind of work is appreciated and understood by the executive team. Lukas: Interesting. Well, great to talk to you. That was super fun. Thanks for- Jeremy: You too, Lukas. Lukas: -answering my wide range of questions. It's always so inspiring to talk to you. I really appreciate it. If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So check it out. Jeremy: And how is everything going at Weights & Biases? I always hear nothing but good things about it. Everybody loves it. I've got to admit, actually, the other day I was talking to my friend — I think it was Tanishq — about like, ""Oh, what's going on with this learning rate here? I wonder if it's working properly."" And then he's like, ""Well, here's a graph of the learning rate."" I was like, ""Oh, that was quick and great. Where did that come from?"" He's like, ""Weights & Biases, it logs it."" Lukas: Yes! Oh, man. Are we still recording? Put that on the... Jeremy: I probably should have looked at the Weights & Biases team. Here I was with like ""plot.plot(x = ...)"", and he's already got it pasted into the Discord chat. Lukas: All right. Well, that made my day. Thanks. Jeremy: Cheers, mate.",10657 +"Jerome Pesenti — Large Language Models, PyTorch, and Meta",https://www.youtube.com/watch?v=zvwkVeSbiRo,3155,2022-12-22,"Jerome: When people overbuzz AI, I ask them, ""What did AI change in your life?"" What did AI change? Really, truly. Don't tell me you set a timer on Alexa or Google. That's not life-changing. What was life-changing that came from AI? Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Jerome Pesenti was VP of AI at Meta, which is one of the most exciting places where AI research is happening. Before that he was CEO of BenevolentAI, and before that he was VP of Machine Learning at IBM Watson. So, he's had a long career and has seen a ton of different applications and lots of change in the state of the art in machine learning. This is a super fun conversation, and I hope you enjoy it. Lukas: The first question that's top-of-mind is just, with all the advances in large language models that we keep seeing — I know Meta had Blenderbot — I was kind of wondering if you have a point of view — or Meta had a point of view — on building a large language model differently than a DeepMind or an OpenAI, and how you think about that? Jerome: Oh, wow. You go right deep into the challenge there. I would say the large Transformer models...I think at this point, it's not just a language model, right? The Transformer and large models are starting to really be able to be used in multiple tasks. I think this is a trend that everybody is following: size, multimodality, more data, more self supervision actually and less classical supervision, rather than trying to do multiple tasks at the same time. I think this is working really well. It's why people call them ""foundational models"". I'm not sure I agree with that term. So, I do think everybody's going in that direction and that's paying out handsomely. Where I would say I'm a little bit more cautious is, I think these models have lots of problems. And solving these problems is not trivial, not easy. I would say there's two abuse class of problems I've seen, and so the people who will be able to solve that really will be onto something that's interesting. One is control. When you have these language models...I don't know how much you've played with Stable Diffusion or GPT-3. It's really, really surprising in the things it gives you, but sometimes it really doesn't give you what you want at all. It's not necessarily what you ask. Sometimes it has big artifacts that show that it's not humanly generated. And it's not quite clear how you get rid of all this. There's this whole thing around prompt crafting. I think it's interesting, okay, but I don't think you can...I mean, it's kind of scary to say you're going to do like...that there's going to be a new type of software engineering that's going to be for... Because it's so unreliable, you know. And so that's the first piece, which is, ""How do you make all these models more controllable?"", which is like you have a higher guarantee of what the outcome is going to be. The second is bias. Obviously intelligence is about bias, but if you type something...I mean, the easiest way to do it is on these new image generation models. If you type ""CEO"", guess what you get. If you type ""assistant"", guess what you get. If you type ""fast food worker"", or if you type ""banker"". It's striking. I mean, it works. Like 100% of the time, you get extreme bias. And it means you can't really just use this in production. I think it would be terrible. So, very exciting. I think everybody's seeing the trend there. It's working-scale, multi-modality, multi-task, self supervision. But, you know, they are not very controllable and they have huge bias issues. Lukas: Do you feel like there are still intrinsic cognitive limitations, like a Gary Marcus might say on Twitter? Where do you sort of stand on the promise of this technique with Transformers? Jerome: I'm definitely...you have the spectrum of Gary Marcus on the left and you have people who are extremely enthusiastic talking about AGI on the right. I'm squarely in the middle. Lukas: Oh no, this is going to be a boring interview. Jerome: Yes, yes. I mean, I can tell you some things that are very, you know, controversial. I think Gary really over-does it, because the progress is undeniable. I mean, everybody seeing the systems are surprised. I've been in the space for more than 20 years and I look at the stuff and I'm blown away. If you had asked me a year ago, ""Would we have made this progress?"", I wouldn't have guessed it. I thought these tasks were higher. But I think what happened is that the more you get closer to human-level intelligence, the more you realize that the task is much harder. Some people are like, ""Oh my god, we're going to lose our job as developers, as creators."" No way that's going to happen. We're still millions away, because as soon as you make some progress, you realize that...and it's some people I've said, but it is that the goalpost actually looks further, because you realize actually intelligence is a much wider space. It's much more complicated. You realize that the system still makes very, very silly mistakes that humans wouldn't make, but it does things that you didn't think would be possible. I am squarely in the middle, which is I don't think we are anywhere close to human intelligence. I also think that ""AGI"" is a bullshit term. It doesn't mean anything because intelligence is by definition, never general. And then I don't buy Gary, because you can't deny the progress. You look a bit like a fool if you deny that. But, it's such a much bigger problem than people imagine. As we said at Meta/Facebook, we're 1% done. And I really believe it, we are 1% done. We did go 1% of the way, and that's a huge accomplishment. Lukas: 1% what? Jerome: 1% to human intelligence. We've made progress. We've made real progress, right? But it's such...intelligence is so amazing, that you still have a long way to go. Lukas: But don't you feel like the stuff that we're building is starting to help build the next generation of that stuff? I kind of can't believe how well the code generation works. I've been using it in my VSCode. Jerome: That one is also super overstated. Lukas: You think so? Jerome: Absolutely. You are in software, right? I give you a piece of code, okay, and I tell you it's 99% accurate. How good does it give you...the problem is that generating code that's not accurate...I mean, sometimes finding a bug is way harder than writing the code from scratch, right? Lukas: That's fair. Jerome: I think the way to think of Codex and stuff like that is like an auto-complete. It's a very smart auto-complete, the same way when you write your email right now, Gmail does auto-complete. It can complete sentences, and it's quite smart, and it's quite impressive. And if you cherry-pick the results, it looks amazing and it's very surprising what it can do. But, you know, it writes something, and then you have to say, ""Well, is that actually accurate?"" You don't have guarantees, and not having guarantees in code is a huge, huge problem, right? Really bug-free code is worth a million times [of just] code. It's not the size of the code that matters. So, I'm really cautious on this one. I do think it's a useful developer tool. People will use it like they use auto-complete to write email. But it's not going to write...it's not going to put developers out of a job. No way. And especially...it's tricky when you write code, because you need to have guarantees. Lukas: Well, I certainly feel like it helps me write code faster. I imagine better versions of it could...it seems very far from putting someone out of a job, but it seems like it could make you work faster. Jerome: It may make you faster, but is it better or is it worse? You can write worse code faster, I'll give you that. That's for sure. Is it really allowing you to write...I think it will — I also believe it, right? — it will make people faster. But how much will depend on the validity of the code? If you had a system that could guarantee you that the code is accurate, that would be a complete revolution. This is not what it is, right? Again, having guarantees and having control over the outputs is something that's really one of the big challenges of these models. Making sure that what it says is accurate, that's another thing. These language models, they hallucinate. Avoiding that is really, really, really tricky. Lukas: Going back to my earlier question, now we're seeing a whole bunch of different big models coming out that all seem functionally like Transformers. You know, trained on a huge corpus at...basically all text that anyone can find, as far as I can tell, and high volume. Do you feel like the research is sort of converging on this one technique? Or do you feel like DeepMind and Meta have different strategies and points of view there? Jerome: Well, actually, you should have seen Yann's tweet a few days back. It's like, ""Hey, it's weird. Nobody talks about reinforcement learning anymore."" Which is...Yann had said — I don't know if you remember — ""That means we don't really need the cherry anymore."" I don't know if you remember this metaphor of the cake. The cherry is the reinforcement learning and supervised learning is the icing, and the body of the cake — the genoise — is unsupervised and is self-supervised. He really, I think, predicted that it would happen. And it is happening. From an information theory perspective, it makes sense. When you do reinforcement learning, you get very little information whether you're right or wrong. It's kind of binary: ""Yes"", ""No"", you are going in the right direction. With supervision, you just use a label. And with self-supervision, it's where you use the whole data, so maximizing the information you get out of the data is definitely the trend. I think that's where we're going. And, you know, you see self-supervision happening in every other field. The flip side also is, Transformers are just working amazingly well, and scale is working amazingly well, and the combination of all these right now is a trend. I don't think we have a secret sauce that would be...or we ""had"", as you know I'm no longer there. Lukas: Right, interesting. Do you feel this concern that very few people will be able to do this training at large-scale? What do, actually, academic institutions do in a world where the most exciting results are coming from very, very high-volume training? Jerome: Yeah, it is concerning. I can tell you that the costs of the system and these models...I mean, just before I left, we put online one of the biggest superclusters out there. It's just extremely expensive. I can't tell you the cost, but it's staggeringly expensive. So yes, it is worrisome and it does work. But, I do believe that we are kind of wasteful in the way we do things today. We are not really optimizing. It was very interesting to see Stable Diffusion come out really quickly after DALL-E. I'm a huge proponent of open sourcing, of open models. I'm actually...Meta had done it with OPT-175B, but it was cool to see Stable Diffusion come out after DALL-E. Not only releasing open source, but also shrink-wrapping it. Now that I'm by myself, actually I've been running it on my own computer or on a Colab. It's pretty cheap and that's kind of cool. I haven't been able to train my own version yet, but at least it's a bit more manageable. But overall, I am a little worried. I'm not seeing how we can avoid this, given how well it works. But we also have efficiency gains we can make. Lukas: We always talk about sort of the practical applications here, and how they're different than research. Can you talk a little bit about at Meta? What were the applications that really mattered to Meta that they were using, and how that kind of differed from the research interests? Jerome: Let me ask you a question because that's something I feel like— Lukas: Please. Jerome: -when people overbuzz AI, I ask them, ""What did AI change in your life?"" Lukas: In my life? Jerome: Yes, in your life. What did AI change? Really, truly. Don't tell me you set a timer on Alexa or Google. That's not life-changing. What was life-changing that came from AI? Lukas: That's interesting. I feel like my life is not that different than someone in the 80s, but by that sense...I actually love listening to music with an agent where I could just request it by saying it. It's delightful, but I wouldn't say it's life-changing. I mean, I assume that all the recommendation systems that I interact with probably guide me...I feel mostly happy about that. I remember when Amazon kind of first came out with a recommendation system, it just felt so great. It was like, there's a whole world of books that I want to read that I didn't know about. That might be the most...I don't know. What do you think? You've probably thought about this more than me. Jerome: It's a good point. Actually, it's interesting what you say. I will challenge that the first one...I don't think for many people, ""life-changing"" is that I can ask something for music and it plays it. Lukas: Yeah, ""life-changing"" is way too strong. Yeah, sure. Jerome: But it is true. To answer your question, you guessed right. Which is, at a place like Meta, recommender systems are just hugely impactful. And in two areas. One is advertisements and the other is organic recommendation. Just that...by the time I left, my team was a few thousand people and [it] justified the entirety of the budget by far, you know, multiple [times]. The ROI of investing in this system with larger-scale — especially, you can imagine, in advertisement — is really staggering. If you ask me, that's actually kind of disappointing, if you think about it. The most successful application of AI so far has been advertisements. And I would say maybe the second-most successful has been recommender systems in apps like TikTok, for example. But it's kind of behind-the-scenes. Lukas: Well, wait, wait, wait. Actually, you're a search guy. Don't you think maybe...I should have said ""search""? I feel like web search is incredible. Jerome: No, because web search came up without AI, right? Lukas: That's true. Jerome: The whole history of AI at Google, I would have liked to be a fly on the wall there. Actually, there was a...Sundar got interviewed by Kara Swisher just recently. He was talking about how much reluctance there was at Google to use AI in search. It's a fairly recent story, actually. And today, even some people...I mean, I do think actually AI is very useful in search, but I would put that in the category of ""behind-the-scenes"", you don't really understand what it's doing. But it's also a late story. Whereas in recommender systems and ads, it came much earlier as a fundamental block. Whereas I think Google worked pretty well early on with traditional information retrieval techniques. So, you're right. I mean, if you ask me to answer the question, recommenders are the big thing. The second big thing — which is especially when I was there — was moderation. Moderation at scale can only be done with AI. Moderation at scale is done. I think you can look at the stats as a report that are done every three months, but now we are up to like high 90s, and most of the things...even though there are 30,000 people doing manual moderation — that pair with AI — the amount of data to process is so great that the majority of the first action is done by AI, in the 95% plus, for things like hate speech, or bullying, or a lot of complex problems. Doesn't mean it works perfectly, but it creates enough friction that I think it does make the system overall much better. Lukas: When you scale up to that massive volume, to the massive volume of inference, what changes about how you approach a problem like that? Say, moderation at scale and trying to moderate everything that's coming into to Facebook. Jerome: I don't know if you're asking in terms of the actual application or the support of that application. Support of the application is very, very hard. I mean, the whole MLOps aspect is just...you know, and we could discuss that. It's really, really hard. I don't think in my tenure at Facebook/Meta, we solved it. We solved some part of it, especially with PyTorch — I think it was a great success — but after all it's hard. All these systems that evolved quickly at scale: very, very hard. On the other side, from a user perspective, scale is tricky because you can have the impression it works well. All our stats show, ""Hey, we made a lot of progress. If you look at since we introduced AI on hate speech, the amount of hate speech in the platform went down 3x."" Unfortunately, that doesn't mean that's the experience of people, and it doesn't mean it's true for anybody, anywhere in the world. Very, very interesting problem. The experience, for example, is very interesting. It doesn't matter if you match your policies and you remove hate speech; what matters, actually is how people experience your product. And that's a very different story. And the experience of people depends a lot on where they are in the world. The language aspect, the cultural aspects are very, very important there. Lukas: It's interesting that you say...actually, I was kind of curious about both sort of the technical and non-technical challenges, but since you bring up PyTorch, I would not have thought that PyTorch was something that you think of as sort of helping with the operations. I feel like when it came out, it seemed oriented more towards research, but I guess maybe I'm wrong there. Jerome: Oh, yeah. That's a long story. I can tell you a little bit of the story, how it happened. Lukas: Tell me the story, please. Yeah. Jerome: Yeah. So when I joined Facebook at the time — right in 2018 — the company had decided to go on a dual path with PyTorch, Caffe2, and ONNX in the middle. I thought, ""That's just such a hack. That's a non-decision."" I think the decision was made two months before I arrived. It's the one thing...usually when you join a company like this, you do not want to make decisions early. This is one decision where I told the team....actually, I didn't say, ""Hey, we should do PyTorch."" I told the team, ""No way, we're going to do this."" We needed...from experience, I knew that we needed to be on a platform that had community support. So I told the team, ""Okay, you're going to have to pick one framework that we know will have traction in the community."" They were honest, and they knew that that could not be Caffe2 at the time. The community support there really dropped. PyTorch was a rising star, but not production-ready. And really, the only one that had all these aspects was TensorFlow at the time. But the team was convinced that the model of PyTorch was better, and allowed more dynamic graphs. So they came back and said, ""Hey, we think we can make it happen. We can make PyTorch a contender, both on the research front and the production front."" And that's where the company bet. For the past four years after the decision, we've been moving almost everything at Meta from Caffe2 to PyTorch. People love PyTorch. So it's not actually a hard thing to convince people. It's just amazing. It's a better tool to do exploration. But it didn't mean we had all the MLOps around it. And to this day, we still are trying to really figure it out. It's not easy, but it was the right choice. PyTorch definitely, as you surely have seen, it's just a product that people love. And you want to start from that. That gave us a lot of traction that was the right direction. But it still lacks a lot of the infrastructure around it. And there are a lot of reasons for that that we could discuss at the end. Lukas: Do you have a theory of why it's so loved? Because we watched this firsthand. When we started Weights & Biases, TensorFlow had a clear lead. And we watched PyTorch overtake it just during our own logs. It was a really dramatic shift. It's funny because from my perspective — and I've dabbled with both — they seem pretty feature-comparable to me. I mean, in the early days, there was obviously PyTorch had the just-in-time generation of the graph. Do you have a theory about why PyTorch seems like it was so much better loved? Jerome: Yeah, I'll give you another little anecdote. I remember the reason actually I felt strongly about this when I joined Meta is before I joined, in my team I remember we had also this problem...at the time you had Theano, you had other systems. We were a small team — I was in a startup and we were in a small team — and we already had a few frameworks. I said, ""We can't do this. We got to agree on one."" And so I think we agreed on one, I think it was TensorFlow. And six months later, they're like, ""No, no no, we got to use PyTorch. No way we can..."" And I'm like, ""We made a decision!"" We went to PyTorch, and I'm like, ""Okay, there is something there."" I actually think that the reason is simple. The people who developed PyTorch — Soumith in particular — had a design mindset. If I were...the mantra, it was actually a user-centric design. It's funny because I think the people who did it didn't necessarily know they were demonstrating they knew the terminology [?], but it really definitely had the research in mind and what they wanted to do. And you can feel it. The problem with TensorFlow is that it was retrofitted. So even if now, because of influence it's there — it has been plugged on top — it still feels like it's crumbled up together. It's hard to acquire the love, you know. You can lose it; it's hard to gain. So it's really about user-friendliness...researcher-friendliness, actually. I think also the fact that research is driving the narrative in AI today. It's not a stable field, right? That really put PyTorch at the center of that universe. Lukas: What were the important pieces that you had to put around it to make it really work for you in a production environment? Jerome: The challenge with PyTorch...actually, the really complex stuff is that it's almost like an anti-pattern. Let me try to explain that. I think there's this saying that ""Early optimization is the root of all evil."" But the challenge with something like PyTorch is that you need to do early optimization. You don't have a way around it. Why? Because you need to create a system that gives a lot of flexibility to users to do a lot of things, yet is optimized. Because scale matters, efficiency and speed matter. So you have this constant challenge of — especially in the interest of the operator internally — to have things that really follow...like, if you couldn't do Transformers today in PyTorch, but it would be awesome in everything else: forget it. Nobody will use it, right? So you need to...very quickly, when you see where the trend is going, you have to go and put very good operators, and you need to optimize it. It is constant progress, they are doing this. That's one challenge. The other challenge is we had to give that team...I'm really a big believer in focus, and in this case, it was a constant balance. I said, ""Hey, look, you have two focuses. I cannot make it simpler for you, and you cannot screw it up."" One is you cannot screw the external community. You have to create something that people will continue loving. You cannot make it bloated, right? The problem when you start creating enterprise software or production software, it becomes bloated, it becomes difficult to use. You can't do this. At the same time, you have to make it work for us internally. It has to have all the production aspects. It has to be deployable, it has to be production-ready, which most people in the research community don't see, don't understand. We had to have these two objectives. And that's hard. The team suffered through, but I think they did actually quite an amazing job at keeping it, because ultimately Meta is going there. It will be 100% PyTorch in a very soon future. And I think the community still loves and adopts it. Lukas: Was there some experience that you were talking about that made you understand the value of community support? Were you using something at a different company, where it didn't have the community support? You just mentioned that a couple times, that it's so essential to use technology that the community believes in. Jerome: Yeah, because I've seen companies be stuck in a dead end. Actually, you could almost argue — maybe they're going to hate me for this — but PHP and Hack at Facebook is a really tricky one. They kind of own it. Facebook is so big that I guess — Meta is so big — they can own it. But I really think this is not very good. I think you see it dying on the vine and you are adopting a technology that just doesn't progress anymore. I've seen it for many systems. I would say all the big data systems, the containerization systems. You can see there's always one winner and if you make the wrong choice, you're stuck at some point moving off from it. Lukas: Right, right. I thought you were going to maybe mention IBM Watson. I'm kind of curious what that experience was like. Jerome: That is a very different story. I can tell you more about this. I think what...I mean, the good thing for me is that I went there through an acquisition. I had created an AI company and IBM acquired it. It was great for everybody. I was very happy. Actually, I think when IBM created the Watson units, that was a bold move. It was really about saying, ""Hey, we believe there is a commercial potential in AI."" That was 2013. At the time, actually, not many people were talking about AI. The deep learning revolution came around in 2011, '12. People were saying it's coming. Actually, Jeopardy! — the challenge when they did it with Watson — did not use deep learning, which is kind of interesting. It's a bit of a dirty secret. It used very little machine learning. It used traditional NLP and managed to get something very good. They made this big bet on it. I think it was really — obviously — the right bet. It was early and it was good. But there were challenges, right? The challenge is that you had to be patient. I tend to say, ""You need to be impatient for profit and patient for revenue."" And IBM did the opposite. They were impatient for revenue and patient for profit. They did a lot of this very large engagement, promising the moon, that you may spend $10 billion to make $1 billion. That's not a very good business. What I was focused on when I was there was to really try to shrink wrap AI and put it as cloud services. At the time, we came up with this idea of putting AI in the cloud as services to do speech, to do conversation. To this day, I think that's still the majority of what Watson is doing. I think it was very ahead of the game. But, the only problem is IBM didn't have much of a cloud. I felt a little bit stuck when I was there because I think it's the right strategy, I think we're getting traction, but I'm building on infrastructure that's not as robust as if you are on Amazon or Microsoft. Lukas: And then you went into drug discovery, didn't you? It's super hot now, I feel like. Is that right? Jerome: Yeah, yeah. I got recruited to be the co-CEO of a company called BenevolentAI. I think it's a fascinating field. I'm a huge believer that it will happen. You can see there's a lot of promising things happening in AI. Even at Meta — in the research team FAIR — we were doing things around understanding the function of proteins, looking at making predictions around free energy on small molecules and catalysis. Very interesting stuff you can do with AI today. Now, that said, it hasn't really completely changed the field. I actually think that drug discovery needs a bit of what I would call a ""Tesla revolution"", which is you need a tech company to take it head on. But it has such a huge amount of domain knowledge that it's a very hard problem. It's similar in some way to what Elon did with Tesla. It takes 15 years to understand what it takes to build a car. And I think drug discovery is even bigger than that. It's even more complicated. But the decision process of these companies — when they approach technology — they're saying ""There's no good model out there, but some models are more useful than others"". Okay, that's what they say out there. The reason is the models are more useful because they just use them to justify the decision they had made before. That's the way drugs are made these days. A lot of decisions made, not a lot of data to support it. A lot of influence, you have a concept called a ""key opinion leader"". That's how decisions are made there. I'm not a big fan of influence authority. That's not, I think, how a business should be run. But that's how it is right now. I'm really looking forward to a big disruption and maybe I'll get involved in this again. Lukas: That would be cool. When we started Weights & Biases, we didn't think that we'd have many pharma customers. And now, we work with most of them. So it does seem like at least the pharma companies believe pretty strongly that there's something there for deep learning to help with drug discovery. Do you have a sense for what the breakthroughs have been that have made things like AlphaFold work well? Jerome: Well, there are different challenges. What I find remarkable is that — and I still don't quite understand it — it does seem that deep learning and especially even the Transformer architecture, for example, are kind of able to understand the grammar of things. Of images, of text, but also of proteins, for example. At Facebook, we had a project — at Meta — where you just feed hundreds of millions of proteins to a language model, and the system from there is able to predict function pretty well. Without having seen anything, with very little supervised data. It's something that I'm just not sure I understand, because it's not like a brain understand molecules, right? That means there's this generic computation that works well in so many areas. And it just still blows my mind. I understand that it can do it for language and for images, because humans can do that. But humans can't understand...can't fold molecules or understand their functions. So, why is it working? Why can you predict...why can you do quantum calculations better with...? I don't know. It's really, really interesting. It seems to me like this thing that's generic even more than human intelligence. Lukas: Yeah, it does seem like an opportunity to do something that humans really can't do. Jerome: That's the case, yes. But there are lots...back to your question, there are actually lots...you have the chemistry, you have the biology, you have the clinical trials, you have patient data. There are actually many, many stages. There is the target identification. For BenevolentAI, one of the big things we were doing is trying to mine the literature to come up with new graphs, find new relationships, new targets. It's very, very early in the game. Then you have companies that try to figure out, ""Okay, give it a target. What are the right molecules that can affect that target?"" Can we do some AI-assisted chemistry there? And then there are people who try to understand better the biological aspects, like how docking actually works. And then you have the patient data and you have the imagery of the patient data. How can you understand it? Can you deduct from there? Can you combine that with genetic information? Actually, there's really literally like dozens of places where it can affect. I was talking to a friend of mine who just started a company to think of how to design...I think he called it ""promoters"". So, not the piece that's active, but the thing that like first [?] in an RNA-based [?], but the thing that's going to say how much is going to be...how potent it's going to be. The little code that you don't pay attention to in DNA that usually tells you how much is used and how much the cells can be affected. I had no idea this thing existed, but you need a code for there, and it's a few hundred amino acids there. Using AI for that might be very good. The advice I gave him was like, ""Hey, go use Transformers. I bet you they're going to...train them on DNA. They'll figure out..."" But I don't know about it. Anyway, there are a lot of aspects of the process where it can help. I would say dozens. Lukas: It sounds like something that you're excited about right now and looking into? Jerome: Yes, it is. Yeah. But I really...what excites me is, ""How do you get..."" I'm convinced that you're going to see a lot of what we call ""business processes"" be improved throughout the industry. I think you're going to see...it's slow, by the way. You're going to see companies adopt [AI] for part of the processes, like insurance companies and banking and healthcare. They're going to take little blocks. They're going to work with these B2B companies and they're going to adopt it. What I'm more excited with is, how do you change entirely a field? You have transportation. Obviously a lot of people are trying that, with self-driving cars or other kinds of self-driving. Maybe that's going to come first. You have healthcare and you have drug discovery, paired. I think you have education as well, that could be completely transformed. But I'd love to do something that not just takes the current companies and just incrementally improves them — which I think is what's going to happen naturally — but changes the game. I think in drug discovery, you can change the game. You can change the decision process. You can change...the attrition that you have right now that makes a drug cost $1 billion dollars will be diminished by 10x. Lukas: I totally agree with you on drug discovery and autonomous vehicles. You'd be blind not to see the opportunity there and the success that folks are having. But I don't actually know that I've seen a ton of success in education. It seems like a surprising...it seems like education actually has the least amount of technology inserted into it. Jerome: Yeah, I agree with you. It's a field I'm very interested in, I've been looking into it. The way I put it...I actually just wrote a little position document for this pretty recently. The way I put it is that I think education is completely...in the war for attention, education is completely outgunned today. If you are a teenager, do you want to go to a boring lecture or do you want to go on TikTok and see stuff by millions of creators that really is adapted to your interests, and understands what you like, what makes you...a system that gets you versus a system that's static. You know, the same way of educating as 500 years ago. It doesn't mean there's no opportunity there. I think there are. But culturally, it's also a difficult field. I think of it...the way I put it is that, look what's happening on TikTok. Kids go on TikTok...my daughters, they send me stuff like, ""Oh, look at this guy, he teaches me math on TikTok."" I'm like, ""Come on"". That's entertaining. I'm not sure that's the way to do it, but it shows you the potential to make it a lot more engaging. You have to engage the user. You have to make it compelling to them. I think there's techniques and there's AI to do that. I think we understand that pretty well, actually. That, I think, is an opportunity. Lukas: Interesting. Jerome: More to come. Lukas: Excited to learn more about this. As someone who likes to learn...I actually think YouTube has become such an incredible educational resource. Even on deep technical topics. And I think the voting is surprisingly effective too. I would have thought that it would be hard for really good educators to sort of bubble up to the surface on very advanced topics, but it seems like it's a pretty good...I don't know. Jerome: I agree. Lukas: The algorithm, I guess, on YouTube is working well for me. I've been learning more math. Jerome: I agree. And you know, when you look at...I think that's the thing that...I'm not sure it works for younger students, but I think for adult education, I think for high school education, a lot of them start bypassing the traditional way, and go into YouTube. But YouTube is also not an educational platform, right? There are other ways to learn. Personally, I love learning through practice and through exercise. Lukas: Totally. Jerome: I think people have different styles. I have a hard time staying in front of a lecture. I love practice and I love something that...the frustration I have with all the education systems today is that they don't start by constantly evaluating you. What are my gaps, what do I need to practice next? What's the optimal thing that I can do next? A lot of systems today really tell you, ""What is the best next thing I can show you?"" That's how TikTok works. So, what is the thing that's going to make you really, really want to come back on TikTok? I don't think education works like this today. What is the thing that's going to make me more informed and want to stay and continue that course? Lukas: Well, I hope you work on this. I'm... Jerome: We'll see. You think drug discovery is complicated? Oh my god, education is also complicated. That's the problem, you know. Healthcare, education, drug discovery, all these complex fields that are hard to disrupt. Lukas: Right, right, right. Some other questions I had, I was wondering...Meta has made this huge bet on augmented reality, as far as I understand. Do you think that machine learning has a role to play there, or has caused some of the interest in AR or VR? It's not a space that I understand super well, but... Jerome: Yeah, and it has a...let me give you a framing for it. The challenge with this new kind of interface...let's assume — which is not a guarantee — that it's going to be a set of glasses that you put on your head. And let's say it's going to be the next platform. Because let's be honest, I think phones are an amazing invention, but they're kind of a frustrating invention. You have a little screen like this. I see yourself always on that little screen. My prediction to you is like in 30 years, people are going to look back and say, ""My god, this is like the Stone Age of interfaces."" So, something is going to change it. The challenge with glasses is that it's not an imperative interface. I'm not typing. In some ways, a phone is a little bit less imperative than a computer or a keyboard. You're clearly telling the computer what you want. When you type, you type the key. There's no ambiguity there. I think the touch screen was a little bit more of an implicit interface. It's not exactly sure what you're saying...it's actually using a little bit of machine learning underneath there to figure out what you're talking about. But it's not groundbreaking machine learning to figure out what exact word. And it's using actually some of these language models when you type on your keyword. But imagine now you have glasses, right? There's no input. So, what is it? One of the obvious ones is voice, but it's very likely that it's not going to be just voice, for sure. It's not going to be just voice. It's going to be gestures. It's going to be motion. One thing that Meta is working on is a little bracelet, they acquired a company that did this. I think it's very, very interesting. You can maybe type in the air or move your finger silently. There's going to be motion. There's going to be trying to understand your intent. The problem with glasses is you don't have a keyboard. You can't enter information. You can't tell the glasses what you want, but you'll need to have a rich interface that understands you. And so AI has to play a role there. It's a very challenging role. It's creating a contextual interface that understands all the context around and lets you really direct the system you have on your face. Lukas: This is probably a speech interface, I'm guessing. Jerome: Speech, the problem is that...speech is part of it. But our guess is — our guess was — is that speech may not play as big a role as you think it will. I mean, when can you really speak to your phone, right? As Siri. How often do you use it? I never use it. So I don't... Lukas: Yeah, I never use it also. Jerome: Yeah, I never use it either. Because it's awkward, right? I'm in the middle here, I'm going to talk to my phone like this? Actually talking to the glasses, while it's possible — I don't know if you saw, Meta came up with the Ray-Ban. My team actually did the speech for it. It's nice, it works well — but actually, there are not many places where you want to do this. Maybe you want to do more motion. Your gestures, other things, a combination of all these things. Tap, you know. The interface will be a lot more complex, multi-model than we assume. It's not going to be just speech. Lukas: Interesting. Okay. Another totally different question that I had — that I was wondering if you had a thought on — is, one thing that's been really striking is NVIDIA's total stranglehold on the training market. I mean, there's some stuff coming out of Google, but it doesn't seem like it has tons of traction, at least in training. Do you have a sense for why that might be? It's lasted a lot longer than I would have thought. There's lots of startups that compete and people working on chips, but somehow it just doesn't seem to move. Jerome: Oh, I know. I would say I know all about it. Remember what I told you earlier, which is that these things are very expensive, right? And when you have a sole provider, it's very complicated and it's very expensive. Thankfully, now the crypto market went down, so I think it's going to be a little nicer for GPUs. But it did feel at that time like a racket, what we were paying for these GPUs. But, the flip side of that is NVIDIA is very good. And they're very good not just because of the GPUs. I think the GPU — especially when you come from more of a PyTorch exploration mode — it works well. It's very multipurpose. I think it's very flexible. That worked really well for us. But the thing also is, NVIDIA got the software really, really well. They really got it right. They work with us amazingly well. They're very competent people to create that. That's a coup de [?], and it's hard to replace. I'll tell you at Meta how I wanted...and I threw some money at other people to say like, ""Go do it, or we'll do it for you."" You got to be able to compete, you know. But software is hard, and they are very talented and they do a great job. And that's what got them there. They just have the best software...they have great hardware and have the best software stack on top of it. If you're serious, it's still the best in town. Even if you compare to the TPU, the benchmarks are comparable, yet the GPU is way more flexible. So unless you have some workloads — I think it works well for ads for Google — the TPU can be competitive, but for the rest, actually, GPU is still the best game in town and they have a great software stack on top. Lukas: You would think more specialized systems would work in more specialized cases, wouldn't you? It's kind of amazing that the flexible system also seems to function the best for almost all these cases. Jerome: Yeah, but think of this thing, the challenge we had, right? Imagine you try to design a chip, and you design it when the big game in town are CNNs, and LSTMs, and a lot of...in recommendation, it's a lot of sparse networks. And then you wake up three years later and everything has changed in the game. The game has changed, and now it's Transformers and actually dense networks start to be really relevant to do also recommendation. You design your chip and it takes five years to get it out. So by the time you get it out, you know it's already over. Which many people are doing and have done as well. It's very hard. It's the problem I told you, this early optimization. Which is if you don't keep your options open — while still optimizing what you have — you may be in a dead end. Lukas: Interesting. Well, cool. We always end with two questions, but I guess before that I'm just kind of channeling all the students that we always get in comments, wherever we post these. You've had this very enviable career in machine learning and we have so many students that use our software and watch these interviews. Do you have any advice for students coming out? What would you work on if you were just sort of entering the field out of undergrad or grad school? How would you think about that? Jerome: Well, I would not...I'm not going to give you specifics, but I'll give you a little story that I got from a guy who used to study ants. He just died recently, E.O. Wilson. He invented a really interesting concept around evolution and he wrote a little book, ""Letters to a Young Scientist"". He says, ""When I was young, I came out and..."" He was in his PhD, and he decided to focus on ants. The amazing thing is, at the time, that sounded like a very crazy idea. Obviously, ants we know as a society are very important now. And he became the world's specialist in it, world-renowned. What I tell people — especially in science — who come out is, ""Don't be afraid of going for something that you own, that's your own thing. Go for it. And be bold about it. And actually, don't assume that everybody has done everything. There's a lot of opportunity for you to own it, and go for it, and be focused on it."" That's what I would advise. I think this is a very wide space. There's a lot of space for everybody. Be bold, be ambitious. Lukas: Fair enough, all right. The last two questions are...one is — and it seems like you're kind of doing this — but if you had extra time to work on something, what would it be? Jerome: It's what I do now, I told you I'd do kite-surfing. Lukas: Yeah, totally. But if you weren't kite-surfing all day long, what would you be looking into? Jerome: Well for me, I'd do two things, because...one is, I was a goddamn manager for the past, like, 10 years. I think the last time I coded was before my company got acquired, and I love coding. So I'm going back to coding, I'm going back to getting my hands dirty, really understanding... As much as my team developed PyTorch, do I really understand it? Do I understand how it works? I'd spend more time doing this, and that's a lot of fun. Lukas: I love it. Jerome: I think Karpathy, just coming out of Tesla, he said the same. My skin is cleaner, I sleep better. Dealing with technical problems rather than people problems is always a big boost. That's what I'm doing: really, really staying up to date. I feel it's really critical to understand. My next stage is, ""Okay, I want to write a Transformer from scratch, what is that? What is in it?"" Lukas: Nice. Jerome: The second one I'm trying to do is really try to evaluate where the big opportunity is. For me, I feel like, ""Okay, I've done the B2B startup, I don't want to do another one like this."" I want to try to see, ""What's going to be the big revolution here? Is that going to be drug discovery? Is it going to be transportation? Is it going to be education?"" I'm going to pick one, I'm going to make a bet, I'm going to go for it. Maybe I'll fail, maybe there's 1% chance I'll succeed. But at this, it'll be worth it to. Lukas: Nice, I love it. Final question is, when you think about taking a model from research to deployed in production and useful, where do you see the major pitfalls? Where are the pitfalls that might be surprising to someone that is just a researcher? Jerome: Oh my god, it's so complicated. It's actually really...it's something I feel like we haven't figured out. I mean, I'll reverse the question, which is, ""What makes DevOps good?"", right? Do you want something that's reliable, that scales, that you can test? Testing in AI, it's hard, actually. How do you test? You can test like...tests that are very close to the model, or you have downstream tests. Imagine you change the speech recognition and you have 20 systems with 20 layers on top of that. How do you test the last system and what depends on that? ""Reliable"", well, these systems...we claim they are deterministic, but they are not, actually. A lot of behaviors are really weird, that you cannot actually completely reproduce, right? And then scale. These things keep scaling. Every year at Meta, we were like 10x bigger, and it wrecks havoc on all your assumptions. It's really, really hard. It really breaks...the assumptions you want to have to create this, they're just not there. I don't think we have figured it out. I think it's still a work in progress. Lukas: Awesome, well, thanks so much. This was super fun. I really appreciate your time. Thanks, Jerome Jerome: Thank you so much, Lukas. Lukas: That was great. Thank you. If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So check it out.",8875 +"D. Sculley — Technical Debt, Trade-offs, and Kaggle",https://www.youtube.com/watch?v=1aajTQvZJ94,3626,2022-12-01,there's plenty of physics that you can do in the world as far as I understand if it doesn't involve having access to a super collider or things like that and similarly I believe that there are and will continue to be a lot of machine learning that doesn't rely on having access to collider scale resources for machine learning you're listening to gradient descent a show about machine learning in the real world and I'm your host Lucas bewild I was recently introduced to D Scully as the new CEO of kaggle which is obviously an amazing site that we all love but I later learned that he was the author of machine learning the high interest credit card of technical debt which is a paper that inspired so many people including myself to go out and start machine learning tools companies so could not be more excited to talk to him today a note to our listeners this conversation took place in August 2022 since then kaggle has only continued to grow all right well it's it's great to to talk to you and I think the impetus for talking was you taking over kaggle which is a really you know important website in the machine learning community and important to a lot of our uh listeners and and users at weights and biases but I realized in in researching you which I should have realized that you are the author of the um machine learning Hiatus technical debt paper which I think um inspired a lot of people um and you know really resonated when it came out um with me and so I thought maybe um you could start for people who haven't read this paper by kind of summarizing it um and I'm also curious if anything has changed since that paper was written I'm trying to remember now this must be like 2016 or 2017 that I think 2015 yeah 2015. if I remember right yeah it feels like a million years ago um but yeah maybe before we get into it for for I think a lot of people have read the paper but for those who haven't if you could kind of summarize that the paper that would be a great place to start yeah sure so um official hi thanks for thanks for having me brother appreciate being here um yeah so you know my journey in machine learning you know it's it's been uh you know a couple decades at this point um I spent a long time at Google working in production systems so some of Google's most production critical ml systems uh um for for many years uh let some Google's ads click through vcdr systems for a while um and during that time I gained a really clear appreciation for the importance of uh you know machine learning uh as a critical part of larger important systems and uh got to experience firsthand all the different ways that things can go in unexpected directions uh and yeah these were systems that obviously had been around for for a long time you know at at the time that we're talking about I guess 2015 or so the systems had already been you know in use in uh in production one form of fashion for uh for more than a decade uh and so at that time um I feel like you know my team and I had some insights into how things work in machine Learning Systems over the long term that not too many other people were in a position to to be able to reflect not just just because just relatively new field at that point um so I thought it was useful to sort of write some of the things down that we were seeing and um using the the metaphor of technical debt I think was a was a useful way to frame some of those things because you know when we think about technical debt from a software engineering perspective um you know we think about the kinds of costs that you incur when you're moving fast uh and you know you probably know something about moving fast and startup land and uh you know uh maybe having to make some some tough calls between um you know getting something out the door now versus you know adding in another you know six layers of integration testing or whatever the the trade-off might be um so there are really good reasons to move fast um it's uh sometimes unavoidable but in doing so we create some costs for ourselves over time that need to be paid down it's not that we can never take those costs on but we better be honest with ourselves about what those costs are and at the time um I think it was underappreciated uh how much technical debt can be occurred uh through the use of machine learning and so um you know it's kind of obvious to to sort of see that a machine learning stack is built on code uh and so you know has all of the technical debt opportunities that normal code has but then it also has these system level behaviors that emerge over time um that have you know nothing to do with sort of code level checks but do in fact create cost that needs to be down so yeah even the simplest things you can think of like when you're first building a model you know oftentimes if you're in a hurry you you rush and you put a whole bunch of features in the model everything you can think of you put it in there you know accuracy is 0.9 you're like okay that's pretty good but I can think of another 20 features and you put all those you know 20 new features in and now it's 0.92 and then you're like well it's pretty good but but if I put another 20 features in uh then I get 0.93 and so we're sort of in this you know uh regime with diminishing returns to some degree it's not necessarily clear when we're throwing all these features into a model um uh what the value of each one is and it's possible that we're putting a lot of features into a model that aren't particularly informative or where the information is being usefully conveyed already by some other feature or things like that it's sort of like a bundled approach um it's typical of sort of early development into machine learning pipeline uh so you know we've been accuracy go up what could what could be the problem right uh so you know as I'm sure you've seen that every time you you add a feature into a model you create a dependency yeah you now have a dependency on some Behavior observation in the outside world and this means that you have a vulnerability uh if that behavior in the outside world changes and it could change because you know people in the outside world change it could change because the Upstream producer of that signal changes maybe they create an upgrade uh which sounds to them like a really great thing but your model has learned not on the upgraded signal it's learned all the weird errors from this aren't around them say you could get some some weird uh behaviors at upgrade time maybe they get sick of creating an ice feature and turn it off that's not going to be a good day in in their production system and so it's really important that when we're thinking about model development that we're thinking about the long-term costs of adding system complexity and model complexity and data complexity at the same time as we're thinking about improving accuracy and I guess you've really experienced this um firsthand or is there any like specific things that happened where you you really thought like oh like that that drives this point home well um so I'm not gonna tell any Tales out of school of course um but I will use the phrase you can imagine a lot uh and uh you can imagine why but um you can imagine that if you had a model that um uh was using uh let's say a topic model from some Upstream producer uh maybe that topic model that takes text and returns a sort of low dimensional representation it's sort of the topicality of that kind of piece of text uh maybe in the early days of development of that topic model at night it might not have had great coverage of um non-english languages uh and so um if Ace if you're training um a model to take that topic model as an input feature that it might learn that the topics reported for certain uh low coverage languages aren't particularly reliable um uh for for whatever reason maybe it assigns them a slight negative weight or something like that um and then um and it's not too important because they just don't fire very often so it doesn't show up in sort of aggregate metrics and then you can imagine if you were a nascent machine learning engineer and didn't know any better you learned that there was an upgraded version of this model uh that uh dramatically increased coverage in some of those low resource languages that now those topics might fire with much greater frequency and so what uh if you don't retrain your model you can imagine that now those uh topic level features inside your model are firing much much more often and maybe sending a lot of content to lower scores than you might have expected so you know uh that's the sort of thing that can happen you can imagine things like an upstream producer of a given signal suddenly going offline without warning and and data is transitive so you know it might be that the Upstream producer of a signal that you're consuming also has an upstream producer of a signal it's consuming and that that chain might hop several several links um and so it could be that your system is being impacted by some other Upstream signal uh you know several hops up in the fold and if you're not really careful about making sure that uh alerting and um things like that are also being propagated transitively you're not going to know until it's until it's hitting in your production data and so you know these sorts of things can happen um and you you want to be you know defensive as possible right so working on your early warning alertings and all these things to make sure that if something's coming down the bike you get notified in events um you also want to think about you know we talk about coding defensively and regular engineering um you know coding defensively on data often looks like monitoring of your input data distributions checking for things like sudden changes in input data skus or streams uh one thing you could imagine is uh let's say you have a model that is consuming data globally but for whatever reason a a data center in a given part of the world goes down that day like it can happen suddenly your input data is likely to be highly skewed from what it normally looks like because you're missing a giant chunk of data especially if there are say you know large time of day of local time of day effects you could have very uh different behavior for a given day or period of days through an upstream outage that if you don't have the proper input stream alerting about you might not know what to think about do you feel like these problems are getting better or are getting worse and how do you feel like the the change to kind of more complicated bigger more black box models um affects this calculus in 2015 when we first wrote these papers um we got basically two reactions uh one was the sort of you know uh very nice affirming reaction of oh my gosh this stuff is so important thanks for writing this down we wouldn't have thought of any of these things or more often yeah we've we've encountered some of these things but we didn't know that other people did too you know uh those those kinds of reactions um the second major reaction that we got was uh from large parts of the animal research community that was basically what are you people talking about um and uh you know like that that first nurbs paper uh got um you know a uh a full you know poker hand straight over a few scars you know all the way from the highest possible lowest possible a couple win the middle like just no no idea really what to do with it and uh you know eventually they they let us in um uh mostly on the like well you seem to be passionate about what you're talking about you uh people disagree with you maybe so why don't you come and hash it out it was very reasonable statement we were happy to do it um but um I think you know the world here in 2022 understands that these these issues are real that their real work say um aren't just an accident or you know what happens if you hire the wrong email engineer or something like that they're they're systemic and so we need to approach them systemically so now there's this whole field of ml Ops and when you say you know ml Ops people nod sagely and say yes yes we need to invest in mlaps um you know uh it's it's a totally different world from that perspective in that you don't have to convince people um the these problems or problems um uh that that message I think has gotten through and and I'm happy about that um in terms of you know when you have much larger models do these problems get worse um they certainly get more acute and um you know I'm not gonna say that we're in a worse spot because I think that having you know the whole field of really smart people working on these problems and creating you know infrastructure that can help address them and things like that is is a better spot to be in than having people think about these problems for the first time or rolling their own but from a a reliability standpoint um as our models get larger and larger uh you know why are we making models larger and larger we're making them larger and larger because we want to learn usefully for more and more data why are we throwing more and more data at a problem um you know it's if you were thinking of you know the problem say estimating the probability that coin is coming up heads you know you don't necessarily need to go from a billion to 10 billion examples right like basic statistics always say that yeah after a couple hundred flips you're going to get a pretty good estimate you can stop right but we don't do that with machine learning we keep going because um we need our models to exhibit ever more fine-grained behaviors and to respond usefully to a wider variety of input environments and scenarios so we have larger and larger data sets because we need to have more and more and more behaviors that our models can adapt to and can exhibit now if you were to tell a typical software engineer hey the system that we're building used to need to have a thousand behaviors and now it's got a million that person would probably say well our testing is probably also going to be a priority here and you know we used to have you know maybe 2 000 unit tests you know two two for each of these behaviors now you're telling me we've got a million like uh we're gonna have to hire a couple more test Engineers right um and maybe maybe many more um when our models are being relied on to produce many many more behaviors in a useful way uh I think that this really UPS the stakes on our overall processes of uh vetting and quality assurance uh and uh sanity checking and validation of our models uh you know the 20 years ago you have machine learning was basically like look you've got your test set and your training set and so long as they're from the same distribution um we're just going to assume that your your test data has all the behaviors that you're going to need to worry about so no problem just make sure you've got good accuracy on your held out um uh test set and that's not a silly place to start but it's probably not a great place to end um you know why do we use IID data sets from the same distribution for test and training um you know everybody knows that this is what you quote unquote should do but let's let's remember why we're doing this we're doing this because um uh there are clever statisticians who for many decades uh have um uh said important things like correlation is not causation right um and the Machine learning people are like well we're gonna just learn through correlations right we're learning from observational data we've got giant amounts of observational data so we're just going to learn from that and the statisticians are like well what are you going to do about the whole like correlation is not causation thing and the Machine learning people's response is well if we guarantee that the test data is from the same distribution then in terms of outcomes we can ignore this inconvenient fact that correlation is not causation um and the statistician people like well that's not awesome but I guess you're right and so long as you promise that you're testing will always be from a same distribution we can't really argue that yeah obviously it's a caricature and I hope not to offend any statisticians or machine learning people in this but um but so we do this IID test trains but not because we think this is how the world works but because if we don't do that then we expose ourselves to a whole set of of uh much more difficult problems in terms of the learning settings that we're in and you know to some degree you know all of the theoretical guarantees of supervised machine learning rely on this this assumption that we're going to be staying in this IID test transplant world uh and so this is all fine with the one you know small problem that the world actually almost never works this way you know we can um you know offline do our little research idea of like saying okay well I've got my data set I'm going to split it carefully and so these are are therefore from the same distribution but when we go in and deploy a model of the real world it's pretty unlikely that the data that that model encounters is going to be from exactly the same distribution that happened to be in our limited historical snapshot of data that we we collected you know previously because the world tends not to be that kind to us and so our models are going to encounter data from different distributions uh they're going to encounter worlds in which correlations that existed uh spuriously in our training data do not hold or maybe are explicitly broken in our production environment and so this means that we have to really up our game on uh evaluation it means that we can't just rely on test set accuracy or things like that as our final validation we need to you know be much more rigorous about you know cataloging for ourselves and you know talking to our clever domain experts and things like this to tell us okay what are the places where our correlations are going to break down where might our blind spots be and how can we create specific stress tests to uh to analyze our performance in these areas well it's funny though because I remember when um in the very early days of of deploying machine learning that um having a held out a test set that was randomly sampled was actually kind of an improvement over the people's kind of first intuition which is to just kind of try a bunch of different things and and be like I really want everything to improve and I mean I think one thing that can come up when you have lots of different evaluation sets and different constituents is you know some number is going to go down if you have submission evaluation sets on any new model um You release it it's hard to have kind of like a principled process for um you know getting a new model into into production I'm curious how you um think about that or kind of combat that because I'm sure you're you're many more steps ahead along that journey and the work that you do yeah so you know what happens when you have uh have a model that is better in areas but worse than some others and how do you make the call and who chooses these are really important problems um you know you uh there are people to know um a lot more about the world of ml fairness than I do but um uh I think it's easy to see that many of those kinds of fairness issues some um you know human bias issues can can creep in when folks are making decisions about you know version a versus version B and you know where are the improvements and where are the detriments to for a given level uh Improvement or update um so some of these are going to be judgment calls um uh I I think that to do this well um it's it's really helpful to have some standardized practices uh so once digitized practice that I think is underutilized in the field is to have really detailed write-ups um on every single model change you know uh that is being proposed for a new production launch uh yeah almost like a paper or a mini paper just about that one change analyzing it in depth um so that yeah we can have uh some usefully distilled knowledge about what that change is um uh I think that you know machine learning people often play a little bit fast to loose with their experimentation um and uh you know I mean the fact that it's useful to have infrastructure the two supported notebook of experiments and like this is an improvement like it's a really great thing to have but it also says something you know to some degree about the uh the state of the world where where something like this is is seen as a a really useful Innovation which of course it is um but um you know so number one making sure that every single change no matter how small is is carefully um uh analyzed and written down I I really do feel that writing things down is important you know as much as I love having an automated system that that collects all of your past experiments and sort of gives you the numbers I think that that human step of reading through the numbers and you know um drawing a conclusion and and writing that conclusion down in human language so that it can be uh discussed and poked as is a really important step you know to First approximation I think it's science is what happens when you write things down and it's important for us to be scientists um so then you know what's what's standard practice uh everybody brings their write-ups um into a a meeting and um people will talk about them and there have to be you know a couple people who make the call in the end but uh but these things should be discussed they should be debated they should be um you know uh looked at from every lens and uh with you know really carefully with as much data and insight as we can bring in these problems and then and then you know use Lane farmer votes are going to have to make a call but uh but they we should be giving those decision makers as much of context and insight as they possibly can yeah that makes sense I I guess another change big change that's happened since 2015 is many many of the new applications and models operate on unstructured data and I think there's sort of an implicit assumption even in talking about features that were operating on tabular data which I think was the vast majority of use cases in in 2015. do you think there's anything that kind of changes about what you're talking about when um the inputs are you know images or movies or audio files where you probably can't worry about the distribution of like the third you know pixel in every image like it's hard to say what that means even no so it's a great Point um I think that the basic ideas still hold and I'm enough of a dinosaur that I I say features um uh you know sort of as my go-to but I think that the same ideas hold directly even in unstructured data like images like video like audio like you know relatively unstructured text um you know I think the uh uh the first line paper had this really nice example of Huskies on snow backgrounds versus non-snow backgrounds um and I don't think that we have to have extracted a feature you know is snowy background um to to see the point here right um the questions are you know what are the qualities of the data what's the information that's being contained in the data we can often talk about that using the language of features um but it's it's I think it holds generally for any sort of correlation um that's going to exist in our input data and so you know that could be the moral equivalent of smelly backgrounds or um uh you know backgrounds in an image or facial characteristics uh in certain populations or uh any number of of uh characteristics that can come through on video uh or image you know um there's there's some pretty interesting um uh stories of um you know cancer detection uh on images that might have had uh Sharpie circles written around uh some of the images when they were annotated by the original doctors or things like that you know like do those corresponds to literally literal features no but they're that they're certainly uh qualities of the data we need to be aware of in the same way that uh for audio input data um you know speaker characteristics uh and being you know inclusive of a wide range of speaker categories is really really important so I guess I I do want to talk something something about kaggle because that's that's your new Java I'm I'm curious how it's going but I'm also um curious to know what got you excited about about joining kygo in the first place like it's kind of an interesting choice because you know so many I mean I love kaggle I think it's it's played a bigger role in the ml field that people even maybe realized like it was the first place I think a lot of people saw deep learning and it really working for example um but the the criticism kaggle and I think there's some truth to it has always been that you know kind of making a high performing model on a specific data set is sort of the least of the problems of getting you know machine learning to to work in the world and I feel like you're like this real expert on getting you know machine learning models to work in the real world um so how does that connect with you um joining kaggle yeah so um great set of questions so first of all I'm really excited about being part of chemical I um uh have had touch points with kaggle at a couple different points um I I ran uh you know one of the uh uh early competitions and then we we ran another uh competition called inclusive images a couple years ago as well so I've known the team for a long time and I've been a a big fan of the platform um I don't know if you've ever seen any of the papers that I've written around uh you know the sort of state of the machine learning field in general but I I feel that we are at a bit of a tricky spot in um the life cycle of the field of machine learning research we're at a place where there are incredibly strong incentives for people to be publishing papers um I don't think I need to oversell that now but it's it's true that that you know publishing papers is a big deal um you know when you add it all up there's something like 10 000 papers a year you know give or take publish to top conferences each year um but there's a sort of interesting thing uh each of those papers is claiming uh you know 0.5 percent or one percent Improvement on some important problem but happily really improved the field by five thousand or ten thousand percent per year like I don't think so uh so something interesting is happening there if you've been involved with um conferences either as a submitter or a viewer or an area chair um you'll notice that uh our reviewer pools are getting freezy tapped out and they have been for some time you know in today's conference reviewing world it is often the case that uh uh reviewers may be first-year graduate students um which is you know like obviously wonderful that they're performing the service but it's quite a different thing to be getting um a you know high stakes review on the quality of piece of of you know research from someone just entering the field versus somebody's been in the field for many years and and this is just a function of the growth of the field the growth the field has been uh you know pretty astronomical you know uh the number of papers uh you know sort of appearing per year I believe is growing exponentially it certainly was the last time I checked um and the number of qualified reviewers is not growing exponentially so this is interesting um as a field it's easy to see that we're sort of fragmenting um drastically across you know many many benchmarks as a field we're really pushing this idea of novelty it's it's quite difficult to get a paper published without a a novel algorithm um and you know in terms of science uh I think that this is leading to a world where we don't necessarily have the best understanding of um the algorithms that we think are the are the best or they go to because we're so busy inventing new ones um and just as a comparison point I I I no one would confuse me with a physician uh but my understanding is that in the medical world doctors uh often publish papers that are you know case studies about um uh you know diseases or treatments or stuff like this uh I would certainly hope that there is not a strong impetus that every single paper that is published in the medical field has a new treatment you know like if novelty is like the number one thing in every single you know uh medical thing has to be testing something new I'd be worried as someone who likes to go to the doctor to get healthy now in the medical field we often see meta-analyzes we often see replication results we often see case studies that sort of you know say reporting the experience of a a given trial um or a given treatment or things like this and those kinds of papers are largely missing from the field of machine learning research right now and I think it's a problem when I look at kaggle I see a world where we're able to promote much of this kind of missing work when kagglers approach a problem there are often you know thousands of teams competing um to solve a given problem this means that the the level of empirical rigor is you know to my mind simply unmatched by any other process um uh and they're you know compared you know side by side but yeah so we get this nice leaderboard effect and things like this but they the community is also like folks are committed to doing their best but they're also committed to sharing and to communicating their ideas and so you know through uh the notebooks platforms and other things like this that we have in the discussion forums uh there is a tremendous amount of knowledge um being shared captured disseminated that is it's just this incredible resource for the field and it's the kind of knowledge that isn't about novelty it's about Effectiveness and it's about rigorous understanding and so to me that's that's deeply compelling and something that I'm really excited to be a part of now I believe that we can do more to to help distill and share the knowledge that the the community is is generating um but it's it's there in you know implicitly in all of the discussion posts and all of the notebooks and all of the competition results and things like this um so I I find that really exciting and really about compelling and I asked about ml Ops and things like this you know I obviously that's that is part of my background and you know for me to go and say look we've we need really rigorous in-depth analyzes of all our models and then for me to you know then notice that on kaggle you know almost all of our competitions have like a single number summary metric is the the output like yeah I notice a tension there um a I think that over time we'll be pushing to help create more uh competition environments and other environments that allow people to uh experience more of a production environment to be evaluated more on their ability to to do things that are you know make sense in a production environment uh but we just had a competition close that measured efficiency as as one of the evaluation metrics I think things like that are really important uh we can do a lot more in that area so we're gonna you know push to make sure that the community is continuing to go in the most interesting and most important directions I think that's good for everybody uh but overall I view you know kaggle as one of the great uh uh resources in the ml world right now uh I think it's been significantly underappreciated relative to the contributions it's already made as a as a community but I think that with the little bit of help and guidance we can do even more yeah I mean I feel like kygo also does kind of an amazing thing of giving lots of people access to machine learning like you know it's a super friendly community and there's a lot of learning resources um and I do know a lot of people that kind of got their start in machine learning in kaggle and if they'd had to go you know back to school to get a PhD to engage in machine learning they they wouldn't have done it for sure so I think that's an amazing uh thing I I wonder though it's funny you know it's funny because it you know you just said you just talked about you know kind of papers where they're trying to you know eke out the last like you know 0.1 percent of performance and and that does seem like something that kaggle um you know really celebrates and there's there's part of me that like loves that like I think getting you know the last bit of performance out of a model is actually a pretty fun um experience absolutely right you know I I'm not going to argue against really accurate models right you know um I I think that the thing that's most interesting though is you know a finding out what the header is is really important for any given problem and you know from a machine learning perspective you know we're often saying things like well the model is the most important thing but all of these competitions are in application areas where there are people who really care about the you know solving their problem you know whether that's you know helping to save the Great Barrier Reef or identifying uh whales or uh helping to detect credit card fraud or anything in between you know those folks really care about solving important problems for the problem's sake not necessarily machine learning standpoint so making contributions on that side is also really important but but what I find when when folks are motivated to squeeze every last you know percent out of a uh machine learning problem as a challenge it leads to an incredible diversity of approaches and that's the thing that I think is most interesting is not you know necessarily that there was one winning solution at the end and we all you know celebrate that winner as an awesome person although they are awesome people who should celebrate them um it's the we also get a huge amount of information about other things that were tried and seemed like good ideas but didn't work as well for whatever reason um you know we you can think of this as like ablation studies at scale um uh so it's it's not just the position at the top of the leaderboard that's interesting information uh the fact that we do have you know thousands of teams participating and we need the sort of competition structure to make sure that folks are are you know uh properly aligned but the the results that come out of those I think are interesting you know to to distill up and down uh leaderboard although it's funny I mean even without the competition structure there's a lot more on a kaggle these days than the competitions such as absolutely and fun right I mean I I think when Anthony was was talking to me on this this podcast a while back here saying that the data sets was maybe even more popular in the competitions which I was surprised to learn so so we do have uh you know I mean candle has has become you know a really interesting set of resources for the world competitions is definitely one of them but you're absolutely right we have more usage of kaggle um for people looking to access data sets for their own machine learning needs then come to us for competitions and that was something I didn't know um uh before I joined kaggle but it's something that I've come to appreciate very deeply we have you know I think 160 000 publicly shared data sets on cattle uh it's an enormous Trove of information um and what's great about data sets on kaggle is that they're not sort of static things there's opportunities for the community to post little discussions and notes and things like this to post example notebooks so that it's not just about you know getting a CSV file with a lot of numbers in it it's about understanding what's in the data set where the wax might be where the strengths might be and just having a really rich amount of annotation that's sort of evolves from the communities involvement in these data sets now I think there's even more that we can do and I'm excited to do that but um uh you know the data sets are a fantastic resource the notebooks are an incredible resource um you know there's an enormous amount of publicly shared notebooks uh you know hundreds and hundreds of thousands of shared notebooks that have example code that have really carefully written explanatory text so yeah if you're looking to to really learn how to do something and you want some some great examples coming to kaggle and surfing through example notebooks that have publicly shared is a fantastically valuable place to start we also have a wide variety of learning courses for folks who are just ramping up and getting their feet wet I think it's important that we provide those on-ramps so that we can really be sharing uh you know machine learning knowledge is widely as we possibly can so I mean how do you think about the success of of kaggle do you do you look at it like uh like a consumer website like are you trying to increase the weekly active users or something like that or are you trying to make money with it or something else how do you think about that yeah so um I think that kaggle was basically you know the rainforest of machine learning it's this incredibly Rich incredibly valuable ecosystem um that the world absolutely needs and that we probably can't get by without um there's not like a direct Revenue model and I'm not super afraid about that in the same way that I'm not um you know super worried uh when you know companies have a very large research Wing or things like that that might not be you know directly Revenue generating I think that the knowledge that kaggle is generating for the world the value that kaggle creates for the world is so valuable um uh that we we can make a very strong case that this just needs to exist and um you know as a team we're pretty Scrappy um you know uh it's amazing that we have a you know we've crossed a 10 million user um uh threshold uh with a team of 50 right like it's it's not a huge operation um and the the work that folks do you know from the you know notebooks teams to the uh data sets teams to the the folks creating learning content to our competition uh teams these books all worked really hard they're amazing people but they have an incredibly large influence across the world for what they're doing so in terms of you know how do I think about kygo I think about kaggle as an ecosystem this ecosystem has a bunch of different parts that that interact with each other you know we have folks who are coming to us as novice Learners we have folks who are coming to us as practitioners and you know maybe they're you know already doing machine learning on a daily basis is part of their job maybe they're you know quite advanced in their studies and hoping to be to be doing machine learning on uh on a daily basis very soon we have Cutting Edge researchers um you know Jeff Hinton was a famous early winner of one of our competitions um uh we have a you know a large engagement from Cutting Edge researchers and they bring different things to our community and they enrich the community for each other now without the the novice Learners I think that we would lose a ton of uh sort of enthusiastic energy and uh you know sort of keeping us on a stress testing uh without the practitioners I think that we'd be losing a lot of you know real practical know-how and and knowledge for the community that get shared really really wonderfully uh without The Cutting Edge researchers we probably aren't able to have anywhere near as interesting a variety of competitions that are being hosted um or you know the the real uh Next Generation uh Solutions coming down the pike um and of course you know our you know as you say you know competitions isn't all what we're about yep if we don't have the notebooks then I think that we lose a lot if we don't have the data sets I think that we lose a lot so these things play together you know in a sort of interconnected web of machine learning in a really interesting way and I think that thinking about kaggle as a valuable ecosystem and celebrating you know sort of the ecosystem Viewpoint of evaluating whether we're doing a good job is the right thing uh but so then how do you measure the the ecosystem is it is it by usage is that the yeah so you know what is our one magic metric um yeah how do you measure an ecosystem's health I guess yep absolutely so um uh that is something I typed into Google on week two of the job uh how do you measure yeah how do people who study ecosystems measure health and and it it is uh absolutely a thing that requires very gated analysis um and so you know when you talk to um uh an ecologist about how they measure ecosystems they'll tell you look you know we can't just measure whether the butterflies are happy right we can't just measure whether the birds are happy we actually have to have useful metrics on each of the different segments um and so uh you know we've got sort of a usefully defined grid of metrics I'm not going to go into them all here um uh that help us look at each of the the different segments that we we care a lot about and think need to be healthy but really what we're looking for in the end is not being great in one area and then terrible and if I'm bunch of other areas but to to have you know what we call sort of a green flush of being you know very good across all the different uh important areas of our ecosystem so these are like kind of watching people doing behaviors that makes you think that they're happy and successful in what they're trying to do yeah I mean watching people's behavior sounds creepy and we we don't do that um but uh yeah things like uh you know uh everything from looking at how many notebooks are being created on a daily basis to our competition uh uh participation to uh uh you know survey responses and things like this to make sure that our folks are happy to looking at you know the bug reports that are coming in um so looking at long-term metrics like you know number of papers that uh are you know uh citing kaggle one form or another um cool unless I checked they were almost 50 000 of them um I mean you know so they're a wide range of ways that we can assess whether we're doing a good job do you have new things that you want to try to do or things that you want to change like are there new people that you'd like to introduce kaggle to or new ways that you'd like toggle to support um existing people yeah so um you know you asked about this so uh a little bit tangentially earlier you know given my background I think it would be pretty surprising if we didn't push towards some you know more sort of production grade mlopsy style uh uh pieces and Gaggle over time and some of those will certainly be competitions um you know judging a model only on the basis of its uh accuracy by itself is probably not sufficient for everybody's needs uh in 2022 and so we need to be able to provide ways to uh to help uh folks evaluate the models uh on other dimensions including efficiency um and then to also create you know useful and compelling and interesting challenges um I think that uh there's a lot that we can do in the world of benchmarking um and uh you know right now our our main benchmarks are really sort of competitions um but you know given that we have data sets so we have notebooks um uh you know I think that we can move into becoming you know much more long-running benchmarks and Via a repository in service to the community in that way um so um in terms of our you know our sort of user groups and populations um uh we have a really strong uh emphasis right now on uh Outreach for underrepresented populations and machine learning um and that's going to continue for sure uh and when I look at um sort of levels of expertise in our in our community I think that we're doing a pretty good job right now of serving novice Learners you know as you say you know almost everybody who learns machine learning comes to kaggle at some point in their Journey so we want to make sure that we're continuing to serve those folks really well and providing as many on ramps as we can and making making that experience be a really good and really beneficial one um but I think that uh you know we're doing well there and we can really improve on how we're serving the practitioners and engaging the the more sort of cutting-edge research parts of the world as well do you think that um there's any downside to the the framing the competition framing of of kaggle for someone you know getting started like it it's funny how friendly the community is for the idea that you know what what people are supposedly doing is is competing with each other like do you ever think about that that you know for some people they might you know not want to kind of compete with other people for the the most accurate model or something yeah absolutely so I've got two responses to that one is that um yeah we've got our featured competitions where people you know might be winning you know aiming to win some you know prize of uh you know a lot of money or something like that and and there you know people you know many of the competitors are trying to win right uh whether it's winning the prize or winning you know a gold medal in our progression system or you know become a kaggle master or brand master and those are really great and important things to be pushing forward um we have other competitions um that are uh called playground competitions that are designed much more to be an on-ramp and less about you know winning a prize and more about testing your skills but even for the featured competitions um one of my hobbies is you know I'm an amateur marathoner and I like to run marathons um It's a Wonderful fun thing to do um uh and you you get out there you know like all people are cheering and clapping things like that and and that's true kind of no matter where you are in the race and you know spoiler alert I'm not at the front right so um I think that there is something about having a an environment that is framed around a competition that can still be about participation and self-growth that is really important I think really inspiring to a lot of people and that we can um you know make sure to be emphasizing and and have be part of the kaggle experience um you know it's really important and we hear our users telling us this that yeah uh lots of people are coming to not necessarily see if they're going to be first or second but but to improve their skills to share knowledge share ideas and to learn you were most recently a Google Brandon and you know it's sort of you know I think about the work that's like coming out of um you know open AI famously and and uh and other places where you know you get these huge models that in certain axes seem to really outperform um other models work and I wonder like does do you think like you know you if you roll that Trend forward 10 years does kaggle stay relevant like there is there still a role to play for someone you know who doesn't have access to like a massive amount of compute resources to to solve a problem in a useful way yeah so this is a great question um and obviously you know the uh what's going on in the last couple years in terms of you know um true Leaf large-scale uh language models or other multimodal models is uh yeah it's definitely changed the world in a couple of ways one of which is it's changed the world of how some research is being conducted and I think that the the world of high energy physics is a useful parallel now there's there are some kinds of uh so I'm not a physicist I'm just going to say some kinds of physics uh that can only be done with something that looks like a linear accelerator uh where you need to get a couple billion dollars from a local government and build a you know several kilometer mile long concrete tunnel under some hopefully stable part of the world uh so that you can run these these you know incredibly expensive experiments um to gain certain kinds of knowledge um and this has definitely changed the way that some parts of the field of physics Works uh there's no question about it and among other things the world of physics had to get good at doing this kind of research and to have you know in some places a little bit more of a hierarchy on you know how experiments get proposed how they get evaluated um not on their results but whether they should be run at all you know what what gets into the pipeline uh who makes those calls and things like that and I think that we're seeing very similar developments for some kinds of machine learning research but there's plenty of physics that you can do in the world as far as I understand it that doesn't involve um you know having access to a uh a super collider or things like that um and similarly I believe that there are and will continue to be a lot of machine learning that doesn't rely on having access to sort of you know um collider scale uh resources for machine learning um and that can look everything yeah it can look things like uh you know what do we do for resource constrained environments uh so models that need to run you know in the browser I need to run on web devices need to run um on uh distributed edge-based uh things you know my guess is that we probably don't need collider scale resources to train tiny tiny models um what do we do for models that need to be fine-tuned in one form or another or even um yeah things like prompt tuning uh you know where we might have a uh a very large scale model at our disposal but then we need to figure out how to use that model as effectively as possible for a given use case something that I think will be reasonable to attempt for lots of people in specialized domains for a very long period of time uh you know at least as far as I can see forward um the last thing that I'll say here is that it's also useful to think about standards of evidence and verification for these very large scale models and that if you know I'm trying to think of uh how we would go about verifying that a given model um you know we talked earlier about uh the kinds of verification and you know moral equivalent of unit tests and things like this that might need to be put into place um I can't think of too many better resources than a community like kaggles to attack the problem of how do we verify a model that is very very large scale that might have many billions of behaviors or more than millions of behaviors that need to be exhibited in different uh kinds of circumstances to the stress tests to validate models and you know can those be framed in terms of competitions and resources uh other things like that absolutely right so I think that the kaggle community will be uh increasingly relevant over time for these reasons now that doesn't mean that every kaggler is going to you know train a model you know with um you know x million compute hours or things like that that's probably not realistic and probably wouldn't be good for the world if it was but I think there's a lot a lot that we can do that will still add value I guess a lot of those lines do you feel like automl techniques um you know could displace the value of um actual competitions like I I feel like in in the past the winning kaggle strategy was typically to do the best feature engineering but I wonder um actually I wonder if that's still the case and then you know in in these worlds where you know you have these gigantic models that are sort of doing their own feature engineering it's one way to look at it and then Auto ML on top of that what's what is a kaggler to do 10 years yeah yeah exactly so um look automl is a really important tool in the same way that uh hyper parameter sweeps just to take an example at random is a really important tool right um I believe that automl and you know uh useful hyper parameter tuning engines and things like this um do a great job of automating the kinds of work that isn't particularly interesting in machine learning um you know in the early days uh I spent a lot of time being a manual hyper perimeter tuner and it wasn't that rewarding um but the more fundamental questions of what data should be going into a model to train it for a given task um how should we be thinking about data distributions and structures um what are the right structures for a model to capture um you know useful uh you know causal Concepts in addition to just learning from the value correlations as possible um even you know deeper questions of like what is if we're doing say you know fine-tuning of a large pre-trained model like what is the right way to to set that up how do we create the right sets of targets how do we choose the right pre-training base to begin with all of those are interesting questions that I don't think that an automl pipeline is likely to solve you know exhaustively uh in the place of human judgment um in the foreseeable future so I'm very happy for humans to focus on human problems uh and you know places for human judgment and insight is going to be most valuable and where there's drudgery let's automate it and no problem with that well thank you so much we always end with two questions and I want to make sure that I I get them in um and the the second to last question is pretty open-ended um but I'm curious if you think or I'm curious what you think is an underrated aspect of machine learning or something that if you had more time you'd like to spend some time looking into yeah so um I think the thing that is most interesting in machine learning right now is making machine learning be robust to shifting to data distributions uh and so this is where a lot of my work was in my last couple years in Google brain yeah as we you know talked about at the beginning you know when you break that IID assumption between test and train data you have many of the theoretical guarantees that under and supervisory machine learning go away um but we still need things to work and so yeah I think that this is you know absolutely the most interesting area right now for um current work is is figuring out ways to be robust to shifting data distributions and this isn't some sort of weird abstract problem right it's something that happens for every deployed system I've ever seen it also happens for things like machine learning for scientific discovery so if you're going to do machine learning to guide say proteins design or drug Discovery or or any other sort of generative process you know by definition you're going to be moving out from your world of known things because that's the point and so how do we make sure that our models are going to be holding up well to those you know unknown areas that are super important for for advancing saying you don't keep problem areas like like drug Discovery I think that's that's really you know one of the most important areas as far as I can tell do you have like a favorite paper on the topic that we could Point folks to or resources to learn more about that um yeah so we just uh put a paper out um it's the last paper I was involved in the brain um uh called Plex that's looking at sort of a uh a unified view of robustness to data set shift um you know starting with pre-training and then augmenting with a bunch of other Bayesian methods uh with many many excellent co-authors including uh uh Jasper snellkin uh Justin Tran and apology awesome um and I guess final question is when you think about um actually making you know machine learning models really work in the real world you know today in 2022 where do you see the the biggest Gap or the hardest the hardest part of that from from going to like you know kind of kaggle winning model to deployed and useful for someone um in the world yeah so I think what's interesting is that you know uh people like you have have put a lot of infrastructure in place that make things that used to be quite difficult you know uh pretty straightforward now and so you know the challenges of like how do I get a model into production um yeah there there are plenty of packages systems platforms cloud-based Solutions you know you name it that can help people do that um I think that the pieces that are more difficult to solve are really about how do you make sure that that model is going to be a model that you're proud of over a period of time um and you know where that's most obviously you know uh comes ahead in terms of robustness uh which you know might be in terms of data set shifts might be in terms of fairness uh might be in terms of inclusivity or things of these forms but making sure that our models are acting the way that we want them to in a wide variety of deployment situations uh is currently I think much more difficult than just sort of the the mechanics of how do you get a model into production because of the work that's been done on on infrastructure and uh um in so many different areas a thank you so much this is really fun interview I really appreciate I really enjoyed it thanks so much if you're enjoying these interviews and you want to learn more please click on the link to the show notes in the description where you can find links to all the papers that are mentioned supplemental material and a transcription that we work really hard to produce so check it out,10917