gradient_dissent_bot / data /summary_que_data.csv
Gladiator's picture
init
d1430bc
raw
history blame
No virus
167 kB
Unnamed: 0,title,url,length,publish_date,transcript,total_words,summary,questions
0,Sarah Catanzaro β€” Remembering the Lessons of the Last AI Renaissance,https://www.youtube.com/watch?v=v3O20NMdOuA,4584,2023-02-02,"Sarah: I think people see the output of models like DALLΒ·E, GPT-3, et cetera, and they're amazed by what AI can do. And so the conversation doesn't even hinge on, ""We have access to this data set,"" or ""We have access to this talent pool."" It's more, ""AI is magical. What can we do with it? Come in and talk to us about this."" And again, I think that that is somewhat dangerous. Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Sarah Catanzaro was a practicing data scientist and then went into venture. She's currently a General Partner at Amplify Partners, and one of the leading investors in AI and ML. Her investments include a whole bunch of companies I admire, like RunwayML, OctoML, Gantry, and others. It's really interesting to talk to an investor who's also technical. She has insights both on how the technology is built and how it's being adopted by the market at large. This is a really fun conversation and I hope you enjoy it. Sarah, thanks so much for doing this. I've been looking forward to this one. I had a bunch of questions prepped and then I was looking at your Twitter and I was like, ""Oh, there's like a whole bunch of stuff that we should..."" Sarah: Yeah. I feel like I've been doing a lot of thinking out loud recently. Including in response to a lot of the hype around Stable Diffusion, LLMs, et cetera. I appreciate the fact that both of us were there in the 2013, 2014 phase where every company was claiming to be an AI company. It feels like we're kind of heading down that road again, which scares me a little bit. I hope at least there are enough companies β€” people β€” who remember the lessons of the last AI renaissance. But we'll see. Lukas: Well, let's get right into it then, because from my perspective, I totally remember at least one other AI bubble. Maybe more, depending on how you count it. I guess from where I sit, it feels like this one might be different in the sense that I feel like these challenges that were always...seemed super, super hard, seem like they're really working. And I feel like I see applications happening unbelievably fast after the paper comes out. Actually even maybe before there's time to even publish any paper on the topic. I think I might be more bullish about large language models and Stable Diffusion than you, which is great because we can actually have an interesting conversation here. But I thought it's interesting. You've invested in Runway, and just the other day Cris was showing me a natural language input into Runway where you could basically type what you want, and it would sort of set up the video editing to work that way. I thought, ""Oh my gosh,"" this might be a totally new kind of interface that lots of software might quickly adopt, I guess. But it sounds like β€” looking at your Twitter β€” it sounds like you were playing with large language models and finding it super frustrating and broken. Tell me about that. Sarah: Yeah, so I think my concern is less about the capabilities of large language models specifically, and more about some of the lessons that we learned during the last AI renaissance. Which I think was roughly like 2014 to maybe 2017, around the time that AlphaGo came out. People were really excited about the capabilities of GANs and RL. At the time, I remember companies like Airbnb, Uber, Lyft building these big research teams, but not really having a clear agenda for those research teams, or understanding how the objectives of their research teams might align with the objectives of the broader organization. And then similarly, you saw all of these startup founders emerge that were talking about changing healthcare with GANs or changing finance with RL, but didn't really have insights into the nuances of those industries. My feeling of why ML didn't work the last time around β€” or rather, why ML adoption didn't occur at the pace that we anticipated β€” was that it was not really a technical problem, but rather a product, go-to-market problem. I am hoping that this time around, we've both learned from our mistakes but also β€” in the intervening time period β€” created enough enabling technologies, such that two things can occur. One is that companies can fail fast. Frankly, one of the things that scares me is that back then I remember a bunch of companies reaching out and basically saying things like, ""Hey, we've got a bunch of data. We'd love for you to come in and talk to us about our AI strategy,"" and thinking, ""I don't care if you have a bunch of data. Let's talk about a bunch of problems that you have, and how ML can solve those problems."" I've come to believe that you can't fight that urge. Founders will always be enticed by the promise of AI. But if they're able to experiment with it quickly, then I think they can start to learn more about the infrastructure, and data, and other investments that they may need to make in order for their AI initiatives to be successful. At the same time, I think by creating these higher-level interfaces that make ML more accessible to potentially the domain expert, it allows people with a more thorough understanding of business problems to at least prototype AI solutions. I'm somewhat skeptical that these very high-level interfaces will allow them to build production ML at scale, but at least they can see, ""Does it work? Do I need to now hire a data/ML team to realize this initiative further?"" Lukas: Do you have companies in mind that you like, that are creating these higher-level interfaces off of ML technology, that makes them usable for real world applications? Sarah: Yeah. I think Runway is actually a perfect example of the phenomena that I see playing out. Some people may not know, but Runway actually started off more as a model marketplace. Their goal had been to make GANs and other types of models accessible to creative professionals, but they weren't really focused on building out the video editing tools, at least initially. They created these higher-level interfaces, such that various creative professionals β€” whether it was artists, or directors, or photographers β€” could start to experiment with ML models. What they saw was that some of the most popular models were models that automated routine tasks associated with video editing. Based on that user behavior, they decided to double down on video editing. In fact, a lot of the model architectures that they've since created β€” including Stable Diffusion β€” were really purpose-built to support the workflows of video editors. I like that sort of workflow, where you use a prototype, or you use these higher-level interfaces to get insight into what users need β€” as well as potentially the limitations of the underlying technology β€” and then you iterate from there. Lukas: I totally remember a time, I think, of the era you're talking about β€” 2014 to 2017 β€” when every company was like, ""Oh, we have this data. it must be valuable because we can build a model on top of it."" Do you see some analogy today to that? What's the common request of an ML team that's misguided, or should be thinking more about problems? Because I feel like data maybe isn't seeming quite as valuable, in the world of LLMs and big models. Sarah: I think that what we're seeing today is arguably more nefarious than what we saw back then, because at least at that point in time, companies had invested in collecting data. They had thought about possibly what data to collect. And so there was some understanding of how to work with data. I think people see the output of models like DALLΒ·E, GPT-3, et cetera, and they're amazed by what AI can do. And so the conversation doesn't even hinge on, ""We have access to this data set,"" or ""We have access to this talent pool,"" or ""We have this type of workflow that could benefit from these generative capabilities."" It's more, ""AI is magical. What can we do with it? Come in and talk to us about this."" And again, I think that that is somewhat dangerous. I was at a conference just last week. There was a presentation on ML infrastructure at a music company, and somebody in the audience asked, ""Does the AI listen to songs?"" It's a perfectly reasonable question. But I think it does kind of belie some of the misunderstanding of AI and how it works. Lukas: In what sense? Sarah: I think people think about AI as artificial agents. They think of AI as something that could listen to a song, not just something that could represent a song and make predictions based upon the content of that song. Again, I think better understanding of what LLMs are and what they can do will be really necessary to identify when they can be useful. Lukas: This might sound...this is a little bit of a soft ball β€” or might sound like a soft ball β€” but I was really genuinely interested in this. I feel like one of the things that you do really well, at least in my conversations with you, is maintain a pretty deep technical and current knowledge of what's going on in data stacks, basically. Or, data infrastructure and ML infrastructure. But yet you're not maintaining data infrastructure β€” as far as I know β€” so I'm kind of curious how you stay on top of a field that seems like it requires such hands-on engagement to understand it well. Or at least I feel like it does for me. Yeah, just curious what your process is. Sarah: Yeah. It's interesting because I'd say that, in some ways, that is one of my biggest concerns. I've been in venture now for about seven years, and so I can still say that I've spent most of my career in data. But it won't be long before that is no longer true. And certainly I have found that my practical, technical skills have gotten rustier. One comment on that is that I do think that losing my Python, SQL skills, etc. has actually enabled me to look at some of the tools and platforms that are available to users today, with a fresh set of eyes. I'm not as entrenched in the same patterns of behavior and workflows as I was when I was a practitioner. So it's been helpful to shed some of my biases. But I think what I've discovered is that you can understand how something works without using it. And therefore there are two things that are kind of critical to building technical understanding for me. One is just spending a lot of time with practitioners, and hearing about their experiences. How they're using various tools, how they're thinking about various sets of technologies. Frankly, just learning from them almost feels like a shortcut. Instead of trying to figure out what the difference is between automated prompting and prefix-tuning, just going to ask somebody and have a conversation with them. Which is kind of coincidental, and perhaps even ironic. Like, accelerate my learning by just learning from people with expertise in those areas. There's a lot that I just learned through conversation with practitioners. But I think going one level deeper β€” either reading white papers or reading research papers that give you kind of a high-level overview of an architecture, or how something works without getting into the nitty gritty of the underlying code or math β€” allows me to reason about these components at a practical level of abstraction. I can see how things fit together. I understand how they work. That doesn't necessarily mean that I'd be able to implement them. Definitely doesn't mean that I'd be able to iterate on them. But it's enough depth to reason about a component, and it's placed in a broader technical stack. Lukas: It's funny though, sometimes I feel like investors...I mean all investors do that to some extent, and I totally get why. But I think that I often feel also paranoid about losing my technical skills, because I feel like if all you can do is sort of figure out what box something belongs to, it's really hard for you to evaluate the things that don't fit into boxes. And I feel like almost all the interesting advances β€” actually, all the products that we want to come out with at Weights & Biases β€” generally is stuff where it doesn't fit neatly into one of those ML workflow diagrams that people make. Because if it was one of those boxes, then of course people are doing it, because it makes logical sense, but it's sort of when that stuff gets reshuffled...it does seem like you're able to maintain a much greater level of technical depth than the average investor, even in the data space. Which is why I wanted to have you on this podcast. I hope I'm not offending any of my current investors. Just a caveat there. You all are wonderful. I really do feel like you somehow maintained a much greater technical depth than most of your colleagues. Sarah: In many ways I'm amazed by my colleagues and what they do, because I think there are many investors that can reason about the growth of companies, and reason about sets of boxes and the relationships between those boxes without understanding what those boxes do. I don't think I could do that, but I've always also just been the type of person who needs to go a little bit deeper. As an example, I started my career in data science, but at Amplify I also invest in databases. And at some point β€” writing SQL queries, working with dataframes β€” I just wanted to better understand what was happening. When I write a SQL query and data shows up in my SQL workbench, what is happening on my computer? I think a lot of people take that stuff for granted. And they can. That is the beauty of abstractions. That is the beauty of technology. We are able to have this video conference β€” we are able to connect over the Internet β€” without understanding how the Internet works. My personality is such that I want to understand how the Internet works. I want to understand why I have service in some places and why I don't have service, and why my dataframe is slower than my SQL query. I do think that that makes me think about technical systems in different ways. Lukas: It’s funny, my co-founder Shawn is obsessed with β€” in technical interviews β€” assessing if someone understanding how a computer works, in his words. Which I think is really interesting, because I feel like I'm actually not... That's kind of a weakness of mine, I always wonder about a lot of the details there, but it is sort of an interesting perspective. I love working with all of my colleagues who have that same drive to understand how everything works. Okay, here's another question that I was wondering, I was thinking about. If I were to come to you, and I had a company in the data/ML space, and I had a bunch of customers that were really who we think of as tech-forward β€” like Airbnb, and Google, and that genre β€” would that be more impressive? Or would you be more thinking I'm likely to succeed if I came to you with a set of customers who we don't normally think of as tech-forward? Like an insurance company β€” a large insurance company β€” and a large pharma company. Which would you look at and say, ""Oh, that seems like that company is going to succeed""? Because part of me watches technology flow from the more tech-forward companies everywhere. But another part of me is like, ""Wow, these kind of less tech-forward companies have a whole set of different needs and often a different tech stack. And certainly there's more of them and they have more budget for this stuff."" So which would be the more impressive pitch for you? Sarah: Yeah, it's funny because I think in many ways the way that VCs make decisions β€” the way that we think about deals β€” is actually super similar to some of the patterns that we observe with neural networks. And that of course means that we have bias. It also means that we learn from patterns that we've observed. So, I can give you the honest answer, and then I can also give you the rational answer. The honest answer is that I would be more impressed by a company that has engaged with tech-forward customers. For the reasons that you described. In the past, we have generally seen that tech will spread from the Airbnbs and Ubers and FAANGs of the world into the enterprise, and not the other way around. We also have a bias that these more traditional enterprises tend to move slower. There tends to be a lot of bureaucratic red tape that you need to navigate. And as such, those markets tend to be less attractive. So, on its face, if you just said...you don't have any additional information about the velocity of sales, about the quality of the tech or team, etc. But like you're- Lukas: -holding them equal, I guess. Equivalent. Sarah: Yeah. That said, I think that is one of the biases that can cause us to make poor decisions. What really matters are some of the things that I just alluded to. If you're able to sell into insurance companies repeatedly β€” and with high velocity β€” that is arguably a better business than a company that spends 6 to 12 months trying to sell into tech companies. So it's less about ""To whom do you sell?"" and more about, ""Is that a big market? Are you able to sell efficiently? Are you able to sell scalably?"" I think sometimes we need to be aware of our biases and the impact that marquee logos can have on our decision-making. Lukas: Well, I can't tell if you think it's a rational bias or not. I mean, in some sense, you could call all pattern-matching biases. Do you really think it would be rational to sort of be less enamored with tech-forward customers than you actually are? Sarah: I think we need to ask ourselves and probe on, ""Under what circumstances might enterprises move quickly?"" A great example of this is a company called Afresh, which was one of the companies that did use RL to disrupt an industry. At that time that so many companies were trying to do the same thing, but didn't have as much insight into what was happening within an industry. They offer tech solutions β€” including things like inventory management and forecasting β€” to companies in the grocery space. Now, you might think that grocery is going to be a super outdated, slow-moving industry. And therefore that selling into grocery chains would be long and tedious. And perhaps not very scalable. But, at the time, a lot of grocery stores were responding to β€” and/or otherwise just terrified by β€” the acquisition of Whole Foods by Amazon. This was then [followed] by the pandemic, which certainly put a lot of stress on their online and multi channel-delivery and e-commerce capabilities. So there were these exogenous shocks which made what might have been slow-moving market participants move a lot faster. Those are the phenomena that we're sometimes blind to, because we just hear ""grocery"" or ""healthcare"" or ""manufacturing"" and think ""slow"", rather than thinking, ""What would it take for the participants in that sector to move fast?"" Lukas: That makes sense. Here's another point that you made on Twitter, that I was contemplating. I actually don't think I have a strong point of view on this, although I really should β€” given the company that I'm running β€” but you mentioned a lot of VCs have been saying that you expect the point solution MLOps space to consolidate. One thing that's interesting about that, is that I think you've invested in some MLOps tools. Do you sort of expect them to expand in scope and eat the other companies? Is that something that you need to bet on when you invest in them? Or would you be happy to see them get bought by other tools? How do you think about investment then, in MLOps tools companies, with that worldview? That's my practical question. And then the other thing that I observe, is that it doesn't necessarily seem like developer tools in general is consolidating. So I think I might even agree with you, but I wonder how you sort of pattern match that against developer tools. Or even maybe the data stack... I don't know. Do you think that the data stack is also consolidating? Or what's going on there? Sorry, I just dumped a whole bunch of different questions on you, but... Sarah: Those are great questions. So, I do think that in general most technical tools and platforms will go through phases of consolidation and decoupling. Or, as people love to say today, bundling and unbundling. I think it's just the nature of point solutions versus end-to-end platforms. You have a bunch of point solutions, they're difficult to maintain, they may be challenging to integrate. You then kind of bias towards end-to-end platforms, you adopt an end-to-end platform. It doesn't address a certain edge case or use case that you're experiencing, so you buy a new tool for that edge case, and unbundling happens. I think the pendulum will always swing back and forth between bundling and unbundling, for that reason. Or coupling and decoupling, for that reason. To be clear, as a former buyer, I don't think that point solutions or end-to-end platforms are the best solutions for a company. I think there's space in the middle, where you have a product that can solve a few adjacent problems. That's typically what I look for when I invest. I want to make sure that the company in which I'm investing is solving an urgent β€” and often point β€” problem. They're solving an urgent and specific problem. However, I typically also want to see that the founder has a hypothesis about how they would expand into adjacent problem areas. It's not that I think solving point problems is bad, but I do think given the pendulum of coupling and decoupling, having some hypotheses about the areas that you can expand into becomes critical. It's interesting to consider why this may or may not happen in the world of developer tools. I'd argue that you still see consolidation. However, the consolidation tends to happen across layers of the stack, versus across the workflow. Lukas: Interesting. What are you...tell me...what are you thinking of there? Sarah: Things like serverless, where you're no longer reasoning about resources and config. That might not be impacting other parts of your developer workflow. That might not be eating into your git-based development workflows, or your testing processes, and things like that. But it is eating into how you think about managing VMs or containers. It is possibly eating into how you think about working with cloud vendors, and deciding upon underlying hardware, and things like that. So it might be the case, that it's like in software development, we've seen companies β€” or we've seen vendors β€” solve specific problems, but solve those all the way down the stack. I haven't really thought about that as deeply. But I think it's a worthwhile question to ask. I would say that one of the big differences, though, that I see β€” and that we of course need to be mindful of β€” is that there are far more developers than there are data practitioners. And so, when you're trying to answer the question, ""How does this thing get big?"", those building developer tools can arguably solve a specific problem for a larger number of people versus data teams when you're trying to answer this question of, ""How does this get big?"", you could potentially get stumped just by the number of people for whom a tool is actually applicable. Lukas: Is that what gives the intuition that we're in a moment of bundling? That there's just all these point solutions that you feel kind of can't survive on their own, just given the size of the market that they're in? Sarah: I think it's a combination of things. On one hand, I see a lot of...the slivers are getting tinier. You start to see things like ""model deployment solutions for computer vision,"" and perhaps some subset of computer vision architectures. Where, you might think to yourself, ""Okay, I understand why the existing tools are maybe not optimal for that specific use case, but that's really narrow."" To my point about thinking about these orthogonal problems, it's unclear how you go from that to something meatier. That's one phenomena that I observed. I think the other is just that practitioners are really, really struggling to stitch things together. The way a friend put it to me about a year ago, he basically said he feels like vendors are handing him a steering wheel, and an engine, and a dashboard, and a chassis, and saying ""Build a fast, safe car."" Those components might not even fit together, and there's no instruction manual. It's easy to cast shade on the startups that are building these tools and platforms, but I think one of the things that is more challenging in the ML and AI space than even like data and analytics, is that a lot of the ML engineering and ML development workflows are really heterogeneous now. If you're a vendor and you're trying to think about, ""With whom should I partner? With whom should I integrate? Do I spend time on supporting this integration?"", it's tougher to make those decisions when practices and workflows are so fragmented and heterogeneous. I do think that creating more of a cohesive ecosystem has been difficult not because vendors are dumb, but because there's just a lot going on. Lukas: Well, I think the other challenge maybe is that when there's so many different technologies that people want to integrate into what they're doing β€” because there's so much exciting research and things that come along, based on different frameworks and so on β€” it's hard to imagine an end-to-end system that would actually be able to absorb every possible model architecture immediately, as fast as companies want to actually use it. Sarah: Yeah, yeah 100%. I have been thinking about this in the context of LLMs. We don't yet know how the consumers or users of pre-trained models are going to interact with those who create the pre-trained models. Will they be doing their own fine-tuning? Will they be doing their own prompt engineering? Will they just be interacting with the LLM via API? Without insight into those interaction models, it's really hard to think about building the right set of tools. It's also unclear to me that the adoption of LLMs would actually imply that we need a new set of tools, both for model development and deployment, and management in production. I have a lot of empathy for people who are building ML tools and platforms because it's a constantly moving target. Yet, there's the expectation that you're able to support heterogeneity in all regards. In all regards, whether it's the model architecture, or the data type, or the hardware backend, or the team structure, or the user skill sets. There's so much that is different from org to org. I think building great tools is really challenging right now. Lukas: I guess that's a good segue to a question I was going to ask you. When you look at LLMs, do you have an intuition on if a new set of tools are needed to make these functional? Sarah: I think one of the bigger questions that I have is, again, on how the consumers of LLMs β€” or how the users of LLMs β€” will actually interact with those LLMs. And more specifically, who will own fine-tuning. I imagine that there are certain challenges that will need to be addressed, both with regards to how we collaborate on the development of the LLMs, but also how we think about the impact of iterations on LLMs. If OpenAI wants to retrain one of their models β€” or otherwise tweak the architecture β€” how do they evaluate the impact of that change on all of the people who are interfacing with the GPT-3 API, or with any of their other products? I think a lot of the tools that were built for model development and deployment today kind of assumed that the people who were developing models would be the same set of people β€” or at least within the same corporate umbrella β€” as those who are deploying and managing models in production. And if LLMs drive a shift β€” wherein those who are developing models and those who are deploying and building applications around models are two completely separate parties β€” then some of the tools that we have today might be ill-suited for that context. Lukas: Do you think we're headed towards a world like that, where there's a small number of companies generating foundational models? And then mostly what other companies are doing is fine-tuning them or doing some kind of prompt engineering to get good results out of them? Sarah: Here we're getting a little bit into the technical nitty gritty, but my impression from tracking the research community so far has been not all...though LLMs are great for what we typically think of as unstructured data β€” primarily images, text, video, et cetera, audio too β€” they have not outperformed gradient boosting or more traditional methods on structured data sets, including tabular and time series data. Although there's some work on time series that I think is pretty compelling. This is one of those areas where I feel like the research community just completely underestimates how many businesses operate on structured data. While it's possible that adoption of LLMs will drive this new interaction model or new market model β€” wherein some companies built these large foundation models and others interact with those β€” I don't see gradient boosting or more classical approaches going anywhere. Because I don't see structured data going anywhere. Arguably, structured data powers many of the most critical use cases within organizations, ranging from search and recommendation engines to fraud detection. I think it would be a tragedy to neglect the needs of those who are using...I don't want to say simpler approaches, but certainly simpler approaches and more complex approaches, by using architectures that are not perhaps attention-based, when working with these specific data sets. Lukas: Interesting. Do you have an opinion on...how to say this? I feel like many investors especially, but I think many smart people looking at the space of ML and data, they think, ""Wow, this is gonna commoditize. This is going to get...tools are gonna make this easier. Less companies are going to want to do this internally and spend money on expensive resources."" But I guess when I look at what companies actually do, it seems like they spend more and more, and even kind of push up the salaries. And they have this fight for scarce, specific talent. Which way do you sort of predict things are going? Do you think like 10 years down the road, ML salaries go up or do they go down? Maybe it's a more concrete way of putting it. Sarah: Yeah, that's a great question. I probably expect that the variance would increase. My guess is that there are certain applications that may be commoditized β€” or at least that may be commoditized for some subset of the market β€” while others continue to be pursued in-house. Search is perhaps a very interesting example. For some businesses, they may be more than happy to rely upon a vendor to provide those semantic or vector-based search capabilities. While search may have an impact on their bottom line, perhaps it's not the most critical or most impactful thing to their business, but rather just a capability that they have. This is not to say that Slack actually uses a vendor or should use a vendor, but as far as I can tell, Slack doesn't really monetize on search. You'd contrast that, however, with an e-commerce business or something like Google, where their ability to deliver the highest quality search results and their ability to improve search β€” just marginally β€” could be a huge impact on revenue. Those companies are probably likely to develop their own models. I think we'll see that some companies do their own model development. Some use cases are not commoditized, and those companies for those use cases you see very high ML salaries. But then, perhaps for others, you're really just a software engineer who knows a little bit about ML, and can interface with some of these models through APIs, and can reason about the output of experiments and behavior that you might see in production. Lukas: I guess in that vein β€” and you sort of alluded to this earlier a little bit β€” what do you think about all these sort of low-code and no-code interfaces into exploring data, building ML models? You mentioned earlier that you think that's generally a really exciting trend. Sarah: My opinions on this category are pretty nuanced, so I was thinking about where to start. Generally speaking, I'm very skeptical of no-code, low-code solutions. I find that many of these tools β€” no matter what the sector or what the use case β€” they end up shifting the burden of work. Not necessarily removing that burden, or even lightening that burden. A great example is self-service analytics. My own belief is that in general, most self-service analytics tools don't actually reduce the burden that the data team or analytics team bears, but rather shifts the work of the data team from building analytics products to debugging, explaining, or fixing analytics products. And I think the same can be true in the ML space. Why I'm excited about some of these tools in the ML space is that I actually think that in ML, failing fast is really critical. Some of these tools that enable users to prototype ML-driven solutions might help them better understand, ""Is this going to work? What additional investments do I need? What do my users expect from the system before they make a decision to invest further?"" It enables that kind of quick prototyping, learning, and failing fast. The other thing that I feel quite strongly about, is that we need to explore ways to decouple model development and ML-driven app development. Whenever I talk to companies about their ML architectures or their ML stack, it becomes so obvious that ML is just this one tiny component in a much larger app architecture. The prediction service might be connecting with other databases, or stream processing systems, or other microservices, tools for authorization, and so on and so forth. I think it's really important to be able to build applications around a prediction service while independently iterating on the model that powers that prediction service. So, I am somewhat long on tools that enable engineers to prototype ML-driven systems, so that they can build those application architectures. And then, once they have a better understanding of the full system requirements β€” including some of the latency associated with things like moving data around β€” they can kind of pass off a fuller spec to a data scientist who will iterate on the model and model architecture, armed with the knowledge that these are the attributes that we need in order to make this project successful. Lukas: That makes sense. Okay, another question. When you invest in a company that is providing some kind of ML or data service, does it cross your mind, ""What if AWS does that?"" Or GCP or Azure. Is that an important thing to consider, do you think, or is that irrelevant? Sarah: Yeah, yeah. I smile because I feel like this question, it comes up somewhere between like one to five times a week. Given the areas that Amplify invests in β€” we're primarily focused on data, ML tools and platforms, enterprise infrastructure, and developer tools β€” we're constantly fielding this question of, ""What if AWS or GCP or Azure does this? Won't that company β€” won't that market, et cetera β€” get crushed?"" In the past, what I've told people is that I have found that startups tend to be better at building developer experiences. Anecdotally, this is just something that we observe. People complain a lot about the experience of using AWS tools, the experience of using things like SageMaker. I've thought a little bit more about why that's the case. I think, generally speaking, the cloud vendors need to develop for their most spendy customers, their highest-paying customers. And their highest-paying customers tend to be enterprises, shockingly. As such, they're developing for an enterprise user who probably has fairly strict privacy/security requirements, who may have a very distinct way of organizing their teams, who may be bringing in a persona with a specific skill set into data science or ML roles. If I had to present a hypothesis about why they haven't been able to compete on developer experiences, I think it's because often they are creating tools and platforms for a developer who is not as representative of the rest of the market. But, to be honest, with the passage of time, I've just seen enough examples of companies that have been able to out-compete the cloud vendors where I just don't worry about it that much anymore. Lukas: Have you ever seen anyone get crushed? Sarah: Crushed? Lukas: Has that happened in your career? Sarah: No. I mean, I'm sure it has. But it's hard for me to think of an example, whereas it's easy to think of many, many examples of companies that were not crushed by the cloud vendors. If anything, I think sometimes we see that start-ups get...they sell too soon. The way in which the cloud vendors out-compete them is putting some juicy acquisition offer in front of them and then they don't have to compete. That's the only example that I could see or think of, off the top of my head, of the cloud vendors crushing a potential competitor. They crush it with their dollars. Suffocate companies with their acquisition offers. Lukas: R&D through M&A, yeah. I saw an interview or a conversation that you had with Andrew Ng. I thought you had an interesting point that academic benchmarks...they often don't really reflect industry use cases. But you were kind of pointing out that industry has some share of the blame for this. Can you say more on that topic? Sarah: Oh, absolutely. I am really grateful to Andrew for actually drawing my attention to this issue. We often think about the gap between research and industry, but we don't as often think about the gap between industry and research. Andrew and I had been talking about this challenge of structured data versus unstructured data. I think I said to him, ""What I see in industry is that most ML teams are working with tabular and time series data. What I see in the research community is that most researchers are building new model architectures for unstructured data."" There's a big mismatch between what model architectures people in industry need β€” given the data that is available to them, as well as given the types of problems that they're trying to solve β€” and the research that's becoming available. Now he pointed out to me β€” and this is something that I hadn't really thought about before β€” researchers have access to unstructured data. They have access to things like ImageNet. They don't have access to high volumes of data on user sessions, or logs, metrics, and events. The data sets that tend to be the lifeblood of most companies. It is very difficult to innovate on AI techniques for data sets to which you have zero access. I think it's easy to point to that research and be like, ""Oh, there's such a big gap between what they're building and what we need."" I think we also need to be mindful of what the research community can do, given the resources that they have available to them. I've seen a couple of efforts by a few organizations to open source their data sets, but it's tough because oftentimes the most valuable data sets are the most sensitive ones. What company wants to share their click-through data that probably reveals the state of their business, some of the experiments that they're running, and so on so forth. Lukas: Well, there's also not a lot of upside. I remember the Netflix contest was such a popular, awesome thing. Got so many people involved, so much attention to research to Netflix β€” still a seminal data set β€” but they didn't do a second one because they felt like...there are user privacy issues, that they couldn't get around to release it. I don't know if you remember when AOL released a subset of their query logs. It was so exciting to actually have that. I was in research at the time and I was like, ""This data set is like gold."" And then like the next day, they fired the person that released it. And their boss β€” I think their boss' boss, right? β€” because there was some personal identifying information in that. It's hard to see a lot of upside for corporations, even if they were sort of neutral on the impact of...on the company secrets, IP issue. Sarah: Yeah. One of the things that I have seen β€” that has been very encouraging β€” is more and more interview studies or meta analyses coming out of the research community. Where it's clear that the researchers are interested in better understanding the problems that practitioners face in industry. One critique that I've had of those studies in the past, is that the authors tend to interview people to whom they have immediate access, which means that they often interview practitioners at some of their funding organizations. The organizations that are sponsoring their labs, which means that they tend to bias more towards larger enterprises or big FAANG companies. They're interviewing people at Facebook, Apple, Tesla on their data and ML tools, platforms, practices, and then drawing conclusions about all of industry. But I think that recently I've seen a couple of studies come out where there's been a more focused effort to get a more random β€” or at least more diverse β€” sample of practitioners from both smaller startups, more traditional companies, bigger tech companies, et cetera, to really better understand both the similarities and differences between how they approach model development and deployment. I hope that continues. Lukas: Do you have a study that's top of mind, that you could point us to? Sarah: So, Shreya Shankar, who had actually been a university associate. Lukas: Yeah, I saw that. Totally. Nice. Sarah: I was really thrilled because Shreya actually reached out to us and said, ""Hey, can you connect us to people at different types of companies? I've got connections to people at Instagram, Facebook, Apple, et cetera et cetera, but I want to talk to people at mid-market companies, or early-stage startups, and B2B companies, and better understand some of the nuances of their workflows."" Lukas: What was the name of the paper? I think I just saw it. Sarah: ""Operationalizing Machine Learning: An Interview Study"". Lukas: Thank you. Yeah, I agree. That was an excellent paper. Sarah: Yeah, yeah. The other thing that I had said...I sent Shreya a text message after reading through it. The other thing that I really appreciated about the interview study was that she didn't cherry pick the insights that were most likely to drive interesting research questions or solutions. I think she took a really genuine and unbiased approach to thinking about, ""What are the problems that people are talking about? What are the ways in which they're there solving them? Let's highlight that there are a bunch of problems that people are just solving in practical β€” albeit hacky β€” ways, but ways that they're content with."" I thought it was a very honest study. Lukas: Totally. I totally agree. Well, I guess if we are possibly headed towards another bubble in machine learning β€” or machine intelligence, as you sometimes call it β€” do you have any advice for a startup founder like me? Or maybe an ML practitioner, which is most of our audience. Having gone through another bubble, how would you think about it? What would you do if you started to...I think we're already seeing bubble-esque behavior. What are the lessons? Sarah: I think the most critical lesson that I saw/learned the last time around was, ""Focus on your users,"" or ""Focus on the strategic problems that you're trying to solve."" And ""Really, really understand if and why ML is the best tool to solve that problem."" I think it's critical to think about machine learning as a very important tool in our toolkit. But one of several tools. I was catching up with a friend a couple of weeks ago, and she had mentioned to me that the way in which she prioritizes ML projects is through regular conversations with their product leadership, and engineering leadership β€” and her representing ML leadership β€” about the product roadmap, about the user behaviors that they're trying to unlock. And then thinking about whether ML or traditional software development approaches are a better tool for achieving those things. I think as long as we continue to think about ML as a tool to solve problems β€” and as long as we have the tools that enable us to better understand if ML is solving those problems, and how to improve upon its ability to solve those problems β€” then ML can be a super powerful tool. And one that we learn to wield in more powerful ways too. But β€” I feel almost like a broken record saying this, given the lessons learned in the past β€” if we treat ML like a silver bullet, if we treat it like a hammer looking for a nail...that was the pattern that I think led to failure. Don't think about ""What ML can do for you"", think about ""What you can do for your country,"" and if ML is the right way to do that, I guess. That's the lesson that we learned and I hope it's the lesson that we will carry forth. Lukas: Love it. We always end with two open-ended questions. The first of the two is, if you had extra time, what's something that you'd like to spend more time researching? Or, put another way, what's an underrated topic in data or machine learning? Sarah: Oh man, that one is very easy for me: programming languages. I would love to spend more time learning about programming languages. I am definitely not convinced that Python is the right interface for data science, or that SQL is the right interface for analytics work. I would really love to learn more about programming language design, so that I could better diagnose if and why Python and SQL are the wrong tools, and how one might go about building a better PL interface for data scientists, ML engineers, and analysts. Lukas: Okay, a question that I didn't ask β€” because I thought it was a little weird or maybe nosy β€” is why you're asking on Twitter if anyone knew any female Rust developers. Because I will say Rust comes up just a shocking amount on this podcast, and I was wondering what's driving the interest in Rust, and then if there was some reason behind looking for a female Rust developer, and if you actually found one. Sarah: Yeah, yeah. So, full transparency β€” and I think I maybe put some of this on on Twitter too β€” quick background is that certainly earlier in my career, I felt like oftentimes I wasn't getting invited to the same set of events, et cetera, as some of my male peers, and therefore I wasn't getting exposure to the same set of conversations β€” maybe even the same opportunities β€” to potentially see deals, and things like that. I feel pretty strongly that we need to have women in the room when we host events, to ensure that they're getting exposed to the same set of opportunities. That we're not doing things to hamper their progress in the industries in which they they operate. We were hosting a Rust developer dinner, and looked at the guest list, and there weren't that many women, and it felt like we could do better. Thus the origins of my question. Lukas: I see. Sarah: Why Rust? See, I wish I spent more time studying programming languages, so I could better understand why people are shifting from C++ to Rust. Luca Palmieri β€” who I believe is now at AWS, actually β€” has a great blog post on why Rust might be a more appropriate backend for Python libraries that often have C++ backends. Things like pandas, where we experience it as Python but in fact it has a C++ backend. I've heard that Rust is more accessible than C++ and therefore could perhaps invite more data practitioners to actually contribute to some of those those projects. But I don't know enough to really say why Rust is so magical, other than a lot of smart people β€” apparently, like Linus Torvald too β€” believe it is. If it's good enough for him, it's good enough for us. I don't know. Lukas: Fair enough. My final question for you is, when you look at the ML workflow today going from research into deployment into production, where do you see the biggest bottlenecks? Or maybe where do you see the most surprising bottlenecks for your portfolio companies? Sarah: I generally think that...there are two bottlenecks that I would call attention to. Actually three, sorry, I'm being kind of indecisive here. One pattern that I've observed with ML is that we often iterate on ML-driven applications β€” or ML-driven features β€” more frequently than we iterate on more traditional software features. To give an example, we may iterate on a pricing algorithm far more frequently than we would iterate on a navigation panel, or an onboarding flow, or something like that. Earlier I was talking about understanding how ML can solve user and company problems. I don't really think we have enough insight into the way in which model performance correlates with behavioral data β€” or the product engagement β€” to iterate super effectively on models. I think that has been a limitation, and one that could have nefarious effects in the future. Another big challenge that I see β€” and I alluded to this before β€” is the challenge of building software applications around a prediction service, or around a model. In the past, people might have talked about this as a model deployment problem. The problem isn't containerizing your model and implementing a prediction service in production. I think that has gotten significantly easier. The problem is connecting to five different databases, each which have different sets of ACID guarantees, latency profiles...also connecting to a UI service, potentially connecting to other application services. The problem is the software development. What you've got is a trained model, but now you actually have to build a software application. I don't think we have great tools to facilitate that process, either for ML engineers or for software engineers. And then around the same space, I also think that the transition from research to production β€” and back β€” can still be challenging. Perhaps what a company wants to do β€” upon seeing an issue associated with the model in production β€” is actually see the experiment runs associated with that model, so that they might get more insight into what is now happening in that production environment. That shouldn't be difficult to do. But, in the past I think we really developed tools either for model development or for MLOps, and we're starting to see some of the pain points that arise when those sets of tools are not coupled together. Lukas: Cool. Yeah, that all definitely resonates with me. Sarah: Lest I sound too cynical, I am really optimistic about the future of ML. I think we just need to do it in a sane and rational way and be mindful of what we're trying to accomplish here, instead of just focusing on flashy press releases and cool demos. Lukas: I was thinking as you were talking about the hype cycle, and large language models, and stuff. I was thinking VCs probably feel the hype cycle the fastest. I'm like, ""Man, we've basically solved the Turing test and, like, no one cares. My parents are like, ""What even is this,"" you know. It's like, ""Come on, this is awesome, look at it."" But I think every investor knows about Stable Diffusion but I don't think...I even come across Chief Data Officers at Fortune 500 companies who are like, ""What's Stable Diffusion?"" It's like, ""Come on, you should know about this."" Anyway... Sarah: Yeah, yeah. But I think there's this awareness, though, of ""This is where the hard work starts."" Lukas: Yeah, totally. Sarah: ""Great, we're able to generate beautiful artistic renderings based on textual prompts. Okay, how do we generate photos that are equivalent to that which a professional photographer would produce?"" Because that's what it's going to take to get a Getty Images or Flickr to adopt something like Stable Diffusion. How do we make automated rotoscoping so good that a video editor doesn't need to correct the mask at all? Because that's what it's going to take for Runway to compete with some of the more traditional video editors. I saw, through Runway, that the research is not good enough. They've had to do a lot of engineering, as well as their own research, in order to operationalize some of these things. I am so optimistic about the potential of the technologies, but I also am realistic that reining them in, and actually leveraging these technologies to do good in the world β€” or to build great products β€” is hard. Short anecdote, but I've been talking to a founder who was working on brain-computer interfaces and actually developed this technology where, effectively, it's able to read minds. You had to put on some big helmet thing, but once the helmet was on, it could kind of transcribe thoughts. And they were able to get it to work. Now, the founder subsequently shifted focus to the gaming space, doing more work with haptic interfaces. I was asking him like, ""Why didn't you pursue the mind reading tech further?"" And he said to me, ""We couldn't find any great use cases."" Isn't that crazy? But I think, this is tech. Sometimes you can do absolutely remarkable things with technology. But it doesn't matter. It doesn't matter unless you figure out how to appeal to people, and get them to use it, and how to align that technology with an important set of problems. I think that is the thing β€” as VCs β€” we need to continue to remind ourselves. Tech is not easy. Tech is not easy, but people are not easy either. Both are really hard. Unlocking new sets of technologies often means that we are granted the opportunity to solve really hard human problems. I guess...TL;DR if GPT-3 starts reading minds. Maybe we'll be able to find some applications for it. But, we'll see. Lukas: Thanks so much, Sarah. That was super fun. Sarah: Yeah, for sure. Bye! Lukas: If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So, check it out.",9519,"In a podcast, Sarah Catanzaro, a General Partner at Amplify Partners, expresses concern about the hype surrounding AI and ML, stating that people are often amazed by what AI can do without considering the data set or talent pool required. She also discusses the importance of experimenting with AI quickly to learn about the necessary infrastructure and data investments. Sarah emphasizes the need for a better understanding of LLMs and their capabilities and warns against the dangerous misconception that AI is magical. The podcast also discusses biases in venture capital decision-making and the challenges of creating a cohesive ecosystem in the ML and AI space due to fragmentation and heterogeneity.
Sarah discusses the importance of failing fast and decoupling model development and ML-driven app development. She also addresses concerns about cloud vendors potentially crushing startups in the ML and data service space. The podcast also discusses the challenge of structured data versus unstructured data in machine learning and the potential benefits of using Rust as a backend for Python libraries with C++ backends. The speakers emphasize the importance of finding practical applications for new technologies and aligning them with important problems.","['What are the dangers of the hype surrounding AI and ML?', 'How can companies learn about the necessary infrastructure and data investments for AI quickly?', 'What are the challenges of creating a cohesive ecosystem in the ML and AI space?', 'How can founders experiment with AI quickly to learn about necessary infrastructure and data investments?', 'What are some examples of companies creating higher-level interfaces off of ML technology for real-world applications?', 'What is the dangerous misconception about AI, and why is it important to have a better understanding of LLMs and their capabilities?', 'How does Sarah build technical understanding?', 'Why does Lukas feel paranoid about losing his technical skills?', 'How does Sarah think about technical systems differently?', 'Would a company with tech-forward customers be more impressive to a VC than a company with less tech-forward customers?', 'What biases do VCs have when making decisions about which companies to invest in?', 'What factors are more important than the type of customer a company has when considering investment potential?', 'Do you expect point solution MLOps space to consolidate and how do you think about investment in MLOps tools companies?', 'Do you think that the data stack is consolidating and how does it compare to developer tools in terms of consolidation?', 'Why do you think there is a moment of bundling in the developer tools space and how does it differ from the data stack?', 'How are practitioners struggling to stitch ML tools together?', 'What challenges arise with the adoption of LLMs and who will own fine-tuning?', 'Are we headed towards a world where a small number of companies generate foundational metadata?', 'How do LLMs perform on structured data sets compared to more traditional methods?', 'Will ML salaries increase or decrease in the future?', ""What is Sarah's opinion on low-code and no-code interfaces for exploring data and building ML models?"", 'How important is failing fast in ML-driven solutions and what tools can enable this?', 'How can model development and ML-driven app development be decoupled?', 'Is the potential threat of cloud vendors crushing startups in the ML and data service space a valid concern?', 'What is the challenge of structured data versus unstructured data in machine learning?', 'What are the difficulties in innovating on AI techniques for data sets to which you have zero access?', 'How can the research community better understand the problems that practitioners face in industry?', 'What advice does Sarah have for startup founders and ML practitioners in regards to the potential for another bubble in machine learning?', 'What is the critical lesson that Sarah learned from the last AI renaissance?', 'What is an underrated topic in data or machine learning that Sarah would like to spend more time researching?', 'Why might Rust be a more appropriate backend for Python libraries with C++ backends?', 'Where do the biggest bottlenecks lie in the ML workflow today?', 'What are the challenges of building software applications around a prediction service or model?', 'What is the importance of finding practical applications for new technologies?', 'How can technology be aligned with important problems?', 'What are the challenges of leveraging new technologies to do good in the world?']"
1,CristΓ³bal Valenzuela β€” The Next Generation of Content Creation and AI,https://www.youtube.com/watch?v=wbonGgk-_Gk,2426,2023-01-19,"Cris: I think a big mistake of research β€” specifically in the area of computer creativity β€” is this idea that you're going to automate it entirely. You see one-click off solutions to do X, Y, or Z. I think that's missing the bigger picture of how most creative workflows actually work. That probably means that you've never actually worked with an agency where the client was asking you to change things every single hour, or make it bigger, make it smaller, right? Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Cris Valenzuela is an artist, and technologist, and entrepreneur, and CEO and founder of a company called Runway, which is a maker of ML-powered video editing software. But I feel that description doesn't even do justice to how incredible and innovative his product is. This interview actually starts off with a live demo of his product. I really recommend switching to video if you're listening to this on audio only, because his demo is absolutely incredible. Well, all right, Cris, we don't normally do this, but I thought it would be fun to start with a product demo if you're down for it. You have such a cool, compelling product. Would you be up for that? Cris: Sure. What do you want me to demo? There's a lot I can do. I want to make sure I can focus on what you want to see. Lukas: Well, this is an ML podcast. So I think people would probably be interested in the most flashy ML features. How about that? Cris: In short, Runway is a full video creation suite. It allows you to do things that you might be able to do in more traditional video editing software. The main difference is that everything that runs behind the scenes...so, most of the core components of Runway are ML-driven. The reason for that, it has two main kind of modes or uniqueness about making everything ML-based. One is, it helps editors, and content creators, and video makers automate and simplify really time-consuming and expensive processes when making video or content. There are a lot of stuff that you're doing in traditional software that are very repetitive in nature, that are very time-consuming or expensive. Runway aims basically to simplify and reduce the time of doing this stuff. If you have a video you want to edit, an idea you want to execute, spending the time, and the minutes, and the hours, and sometimes days on this very boring stuff is not the thing that you really want to do. So we build algorithms and systems that help you just do that in a very easy way. And then there's another aspect of Runway, that it's not only about automation, but it's about generation. We build models, and algorithms, and systems that allow our users and customers to create content on demand. > And everything...baseline for us is that everything happens on the browser. It's web-based and cloud native, which means that you don't rely any more on native computers, or native applications, or desktop compute. You have access to our GPU cluster on-demand, and you can render videos on 4k, 6k pretty much in real time. Plus you can do all of this AI stuff also in real time as well. A lot of the folks are using Runway now β€” CBS, The Late Night Show with Colbert, or the folks who edit Top Gear, or sometimes creators who do stuff for Alicia Keys or for just TikTok or movies β€” they're all leveraging these AI-things via this web-based cloud based editor. So that's a short, five-minute intro, what the product does and how ML or AI plays a role in the product itself. But I'm happy to now show you how everything goes together and the experience of using the editor, if that makes sense. Lukas: Please, yeah. Cris: Cool. Any questions before we do that? I can double down, or if you want to me to clarify? Lukas: Well, I actually didn't realize that professional video teams like The Colbert Show use Runway. Do they use it for all of their video processing or is there a certain part where they they use it? How does that work? Cris: It depends. Some editors and some folks are using it as an end-to-end tool to create videos. Some other folks use a combination of different softwares to make something. The folks who we use it for movies sometimes add in Nuke or Flame. We have a big Flame community, so Runway becomes a part of that workflow. It's replacing either something you do on a very manual basis. It's sometimes replacing a contractor you hired to make that work for you, or it's sometimes replacing your own work of trying to do it yourself in this old software. But you still use other aspects of it, or other software to combine [with] it. It really depends on the type of content you have and the level of outcomes that you that you need. But we do have folks that use it as an end-to-end content creation and editing tool. Lukas: Cool. Well, I mean the extent of my video editing is basically modifying videos of my daughter to take out the boring parts and send them to my parents. That's as far as I go. Maybe you could sort of give me a little bit of an overview of the cool stuff you can do with Runway. Cris: Totally. You can do all of that in Runway on the browser which is...you might be...you might start using Runway for that. The one thing I would emphasize is, everything is running on the cloud, on the web. You can just open any project with a URL. You can also create teams, and you have this baseline collaboration aspect that just runs out-of-the-box. Cool. Anything else? No, just go demo? Lukas: Yeah, let's see a demo. Totally, yeah. Show me the cool stuff. Cris: Perfect. So, this is what Runway looks like. If you're ever edited video before, it's a very common interface. We have tracks on the bottom. We have a multi-editing system with audio tracks, and keyframe animations, and text layers, and image support. You can preview your assets on the main window and have a bunch of effects and filters on the right. Again, everything running pretty much on the cloud in real time. The idea here is that there are a lot of things that you can do that are very similar to stuff that you can do in other applications, plus there are things that you can't do anywhere else. Let me give you an example of something that a lot of folks are using Runway for. I'm going to start with a fresh composition here. I'm going to click one of the demo assets that I have here. I'm going to click this. I have a surfer, right? On that shot, let's say I want to apply some sort of effect or transformation to the background of this shot. Or I want to maybe replace the person here and take it somewhere else. The way we do that today would be a combination of frame-by-frame editing, where you're basically segmenting and creating an outline of your subject, and every single frame you move you have to do it one more time. For that, we built our video object segmentation model β€” which we actually published a blog post and a paper around it β€” that allows you to do real-time video segmentation. In film, this is actually called rotoscoping. You can just literally go here, guide the model with some sort of input reference. I tell the model this is what I want to rotoscope, and it can go as deep as I need. I can select the whole surf layer here at deeper...more control over it. Once the model has a good understanding of what you want to do, it would propagate that single keyframe or single layer to all the frames of video in real time. You get a pretty smooth, consistent segmentation mask that you can either export as a single layer, or export as a PNG layer, or you can use...go back to your editing timeline and start modifying. You said you want to cut it, you want to compose it, you want to do some sort of transformation...from here, you can do that directly from here. Let's say I have my baseline β€” or my base video β€” here, I have my mask on top of that, and now I can just literally move it around like this. I have two layers, right, with a surfer. So, something that looks very simple and in traditional software may take you a couple of hours of work, here you can do pretty much in real time. Again, it's something that most editors know how to do, but it just takes them a lot of time to actually do. Lukas: And did you just run that in the browser? Cris: Yeah. Lukas: That segmentation mask, it figured out in the browser and it's calculating all...it doesn't go to the server? Cris: No, it goes to the server. Yeah, there's an inference pipeline that we built that processes real-time videos and allows you to do those things. The compute part is everything running on the cloud. You just see the previews and sometimes β€” depending on your connection β€” you can see a downsampled version of it, so it runs really smoothly and plays really nicely. Also, for every single video there's a few layers that we run, that help either guide something like a segmentation mask. For instance, we get depth maps and we estimate depth maps for every single video layer. You can also export these depth maps as independent layers and use them for specific workflows. That's also something very useful for folks to leverage. So you have this and you can export this. Behind the scenes, we're using this for a bunch of things. Lukas: Cool. Cris: Those are one of the things that you can do. You can go very complex on stuff. Let's say, instead of the surfer, I just want the β€” let me refresh this β€” I just want the background. I don't want the surfer. I can inpaint or remove that surfer from the shot. So I'm just gonna paint over it. Again, I'm giving model one single keyframe layer, and the model is able to propagate those consistently for the entirety of the video. That's also something we β€” as a product philosophy β€” really want to think about. Which is, you need to have some layer of control of input. The hard part of that should just be handled by the model itself, but there's always some level of human-in-the-loop process, where you're guiding the model. You're telling it, ""Hey, this is what I want to remove. Just go ahead and do the hard work of actually doing that for the whole video sequence."" Lukas: Wow, that's really amazing. That's like magic, right there. The surfer’s really just gone. Cris: Yeah. That's something we see a lot, when people find out about it, or when they start using it. ""Magic"" is a word we hear a lot. It's something that...again, if you're editing or you've worked in film or content before, you know how hard, and time-consuming, just painful it is. Just seeing it work so instantaneously really triggers that idea of magic in everyone's minds. Which is something for...that's great, because we've really thought of the product as something very magical to use. So, there's stuff like that. There are a few things like green screen and inpainting β€” which I'm showing you now β€” plus motion tracking, that we consider as baseline models in a Runway. Those are just...you can use them as unique tools, as I'm showing you right now. You can also combine them to create all sorts of interesting workflows and dynamics. There's the idea of, ""You want to transform or generate this video, and take this surfer into another location,"" you can actually generate the background, and have the camera track the position of the object in real time, and then apply the background that you just generated in a consistent manner, so everything looks really smooth. The way you do that is by combining all of these models in real time, behind the scenes. You might have seen some of those demos on Twitter, which we've been announcing and releasing. This is a demo of running a few of those underlying models, combined. There's a segmentation model that's rotoscoping the tennis player in real time. There's a motion-tracking model that's tracking the camera movement, and then there's an image-generation model behind the scenes that is generating the image in real time. Those are all composed at the same time. Does that make sense? Lukas: Yeah, yeah. Totally. Cris: Those are, I would say, underlying baseline models and then you can combine them in all sorts of interesting and different ways. Lukas: Totally. Alright, well, thanks for the demo. That was so cool. We'll switch to the interview format. Although now I really want to modify this video in all kinds of crazy ways. Cris: We should replace the background with some stuff while we're talking Lukas: Totally. Get this microphone out. One question I really wanted to ask you is, I think your background is actually not in machine learning originally, right? I always think it's really interesting how people enter the machine learning space. I'd just love to hear your story, a little bit, of how you ended up running this super cool machine learning company. It seems you're very technically deep, also. And so how you managed to get that depth mid-career. Cris: Totally. Long story short, I'm originally from Chile. I studied econ in Chile and I was working on something completely unrelated. But it was 2016 or 2017, I think, and I just randomly fell into a rabbit hole of ML- and AI-generated art. It was very early days of Deep Dream and ConvNets and AlexNet, and people were trying to make sense of how to use this new stuff in the context of art making. There were some people like Mike Tyka, and Mario Klingemann, and Gene Kogan who were posting these very mind-blowing demos. That now feel things that you can run on your iPhone on real time. But around that time it was someone...I remember Kyle McDonald β€” which is an artist β€” who was walking around with his laptop, just showing people a livestream of a camera. You had basically...I think with an ImageNet model running in real time, and just describing what it saw. And it just blew my mind. Again, it's 2016. Now it's pretty obvious, but around that time it was pretty special. I just went into a rabbit hole of that for too long. It was too much, I was just fascinated by it. I actually decided to quit my job, I decided to leave everything I had. I got a scholarship to study at NYU and just spent two years just really going very deep into this. Specifically in the context of, I would say, creativity. My area of interest was the idea of computational creativity. How do you use technology? How do you use deep learning or ML for really creative tool-making and art-making? That two-year-long research process and exploration ended up with Runway. Runway was my thesis at school. It was a very different version of what you see now. But the main idea was very much pretty much the same. It's like, ""Hey, ML and AI are basically a new compute platform. They offer new ways of either manipulating or creating content. And so there needs to be some sort of new tool-making suite that leverages all of this, and allows people to tap into those kinds of systems in a very accessible and easy way."" The first version of Runway was a layer of abstraction on top of Docker, where you could run different algorithms and different models in real time on this Electron app. You could click and run models in real time and connect those models via either sockets, or UDP, or a web server to Unity or Photoshop. We started building all these plugins where you can do the stuff that you are able to see now on Twitter. Like, ""Here, I built a Photoshop or Figma plugin that does image generation."" We were building all that stuff running Docker models in your computer locally, and you can stream those. It was 2018, 2019. Lukas: Interesting. It must have been a much more technical audience at the time then, right? If you have to run Docker on your local machine. That's not something everyone can do, right? Cris: Totally, totally. I think that that also tells a lot about how much progress the field has made, and how mainstream and how more accessible things have become. Trying to put this set of new platforms and compute ideas for creators, and video makers, and filmmakers required you to know how to install CUDA and manage cuDNN. I don't know if it's just too much. But people were still wanting to do it. There were some folks who were like, ""Hey, this is really unique. I want to understand how to use this."" But then we realized it wasn't enough. You need to go [to] higher layers of abstraction on top of that to really enable creative folks to play with this, without having to spend months trying to set up their GPU machines. Runway has really evolved, and we have a really experiment-driven thesis and way of working on the product. But it's all about trying ideas and testing them out with people really fast. We're building something that hasn't been done before. And so it's really easy to get sidetracked into things that you think are going to work, or ideas that you think are going to be impactful. But since you're working with new stuff all the time, being close to your user base for us has been kind of really, really important. Every time we iterate on the product, I think one consistent line of evolution has been this idea of simplifying...making higher abstraction layers on top of it. The first versions of rotoscoping or inpainting required you to select the underlying model architecture, and understanding what a mask was, and [how] propagation works. If you're really a filmmaker, you don't care about any of the stuff. You just want to kick once, and you want to get a really good result. For us, it's ""How do you build from there, using what we're building behind the scenes?"" Lukas: Were you surprised how well these approaches have worked to generate images? It sounds you started your work in 2017, 2018. The space has changed so much. Do you feel you saw it coming, or have things unfolded differently than you thought? Cris: I mean, things have definitely accelerated. But I think our thesis β€” when we started Runway three and a half years ago β€” was pretty much the same. It was, we're entering literally a new paradigm of computation and content. We're not going to be...we're soon going to be able to generate every single piece of content and multimedia content that we see online. I've been demo-ing generating models for creative use cases for the last three years. What I was showing three years ago, people were like...it was like, ""Hey, this is how it works. This is how you train a model. This is what the outcome of the model is."" Of course, at that time, it was a blurry 100x100 pixels image. Some sort of representation of what you were describing. Most people took it as a joke, like, ""Oh yeah, cool. Very cool. Cool thing."" Or as a toy, like, ""That's a fun thing, right? You kind of use it once. But of course, I will never use this in production."" I remember speaking with this huge...one of the biggest ad agencies in the world, and I was presenting to other executives. Here's the future of content, type anything you want. And something blurry came out and they're like, ""Cool, not for now."" And they reached three weeks ago being like, ""Hey, how many licenses can we get for this, tomorrow?"" Because the models are going just so much better, that it's obvious. It's transforming their industries and a lot other things. I think what has changed for us is pretty much the speed. Now we're entering a really nice moment where things are converging, and there's a good understanding of what's going to be possible, and where things are going. Scaling laws are getting to a good point. And so continuing the same, but the thesis of the company was always built on that this will happen, and it's happening sooner rather than later. Lukas: Do you have a perspective on if this acceleration will continue, or if we just are seeing a breakthrough, and then we're going to need new breakthroughs to get to the next level of quality? Cris: Sure. I think there's definitely more compute that needs to be added to this, more data sets. I think we're still scratching the surface of what it will become. There's still this...I was discussing this with a friend the other day, this idea of a curiosity phase where people are entering the realm of what's possible and coming up with all these solutions and ideas, but there's still a difference between those concepts, and explorations, and ideas and meaningful products that are long-term built upon those. What I'm interested in seeing is how much of those ideas will actually convert over time, over meaningful products. I think that conversion of products is not just pure research or pure new models, there needs to be a layer of infrastructure to support those things. It's great that you can run 1 single model to 1 single thing on X percent. But if you're trying to do that scale on a real-time basis for 10 people, that then use it on a team and depend on it for their work, then there's a slightly different thing. But I think we're about to see way more stuff around video, specifically. I think image might be solved in a couple of more months and video is starting to now catch up with that. It's a really exciting time for that. Lukas: What does something being solved mean to you? Like, you could just get any image that you would ever want or imagine? Cris: Yeah, that's a good one. That's a good question. I would say that I would consider being solved [as] being able to translate something like words or a description into a meaningful image or content that pretty much matches where you're trying to...what you're imagining. And if it doesn't, you're able to control really quickly and easily to get to the point where you can arrive at your final idea. That's why the combination of models really makes sense. It's going to be hard to have a full model that does exactly what you want. For instance, for image generation. I think it's a combination of, you have a model that does the first model, which is you generate something. There's no pixels, you generate the pixels. Second step is, you're able to quickly modify it, or inpainting, or grade it in some way, and start it in some other way. But that whole thing just happens in a few seconds or a few minutes, right? If you speak with anyone in the industry, VFX, or ad agencies or content creation, post-production companies, these are stuff these guys do all the time. This is what they do for a living, right? They're able to create content out of nothing. The thing is just it's really expensive. It's really, really expensive. And it involves a lot of time and rendering and skilled people to get to that point. I think for me, ""solved"" is, anyone can have access to that professional-level grade VFX-type of content from their computers and from a browser. Lukas: Do you ever think about making a version of Photoshop, instead of a video editing software? If you think images are closer to being solved. Certainly I can't go into Photoshop and get exactly the image I want. I love to play with all the image generation tools out there. But I do think they're amazing at first, but then you kind of hit this point where if you really want the image to look like you want, it gets kind of frustrating. It seems there's also room for an image version of what you're doing. Is that something you'd consider doing? Or, why not make that? Cris: Totally. Yeah. The answer is absolutely. I think, a few things. One, I think we're converging more to this idea of multi-modal systems where you can transfer between images, and videos, and audio. I think the idea that we've been...we built software to deal with each media independently. There's audio editing software, and video editing software, and image editing software, and text-based...you have models that can quickly translate between all of those. Content β€” let's say video β€” it's a combination of different things. You have images, you have videos, you have audio, you have voice. All of those things are now possible. I think for us, when I think about the product philosophy of Runway, it's less about, ""How do you build a better Photoshop or a better Premiere?"" Fundamentally, these models are just allowing you to do the things that none of those others can do. If you think about marginal integrations of those things...yeah, you build a better Photoshop that has a better paintbrush, or a better contact server tool. But ultimately, when you combine them in new ways, you create a new thing. It's completely new. It's not Photoshop, it's just a new way of making videos, and editing images, and editing audio. All in one, single component or tool. For me, what's really interesting is the multi-modal aspect of things, and translating also into those. And 3D, for instance, it's one of the filters...you're going to start to see a lot of translation between images and videos on 3D. Lukas: Totally. So, I have to ask you your thoughts on deep fakes and things like that. I'm sure everyone asks you that, but I'm really curious what you think about that. Do you think that you would want to put in limitations into your software to not allow certain things? Do you think this is about to change the way we view videos, as this technology gets more standardized and available to everyone? Cris: For sure. As [with] every major technology breakthrough, there's always social concerns about how it might be misused or used in not the right, intended ways. It's a good exercise to look at history to see what has happened before. There's this really good YouTube video about Photoshop when it was first released, I would think about the early 90s. They were like...it's kind of a late night show, and they're discussing the ethical implications of manipulating images in magazines. And they're like, should we allow to manipulate images and put them in magazines? Half of the panel was like, ""No, we shouldn't."" It breaks the essence of what photography is, right? 20 years after that, it makes no sense to think about not doing something like that, right? There's always an adaptation process, I would say, where people need to...we need to collectively ask, ""Hey, how is it going to be used?"" But I think ultimately, you understand what the limitations are, and you also fine-tune your eyes and your understanding of the world to make sense of that thing. Now everyone knows that ""Photoshop"" is a verb that you can use to describe something that's manipulated. You do that same exercise, and you go back in time, and you see the same. When film just started to appear, there was this story, interesting story about...one of the first films that were made is a train arriving to a station. They were like, projecting that on a room. When people saw the train coming to a station, everyone ran away because they thought a train was coming to a station, literally. But then you make sense of it, and you're like, ""Yeah, this is not true. I understand that this is an actual representation of something."" Ultimately, I think with AI and with generated content, we'll enter a similar phase, where it's going to become commonplace and something people are familiar with. Of course, there's going to be misuses and bad uses. Of course, people can use Photoshop for all sort of evil ways. But the 99% of people are just like, their lives have been changed forever in a positive way because of this. Lukas: Interesting. Well, look, I'd love to hear more about your tech stack. This is a show for ML nerds of all types. I think you're doing pretty hardcore ML at scale. What have been the challenges of making this work, making the interface as responsive as it was? What were the key things to scale up your models? Cris: Sure. There's a lot of things that we had to kind of come up [with] creatively, to make this work in real time. On the one hand β€” on the ML side β€” we mostly use PyTorch for all of our models. We have a cluster β€” basically, an AWS cluster β€” that scales based on compute and demand, where we're running all those models for training. We use sometimes Lighting and, of course, Weights & Biases to follow up and understand better what's working in our model training. Serving, we optimize for different GPU levels or compute platforms, depending on availability. We've made some systems to scale up depending on demand. On the frontend side of things, everything's Typescript and React-based. There are some WebGL acceleration stuff we're doing to make things really smooth. And then the inference pipeline, where we're writing everything in C++ to make it super, super efficient and fast, specifically since you're decoding and encoding videos in real time. We also built this streaming system that passes frames or video frames through different models to do the things that I just showed you. And so we also had to come up creatively with that. That's kind of a big picture of our tech stack. Lukas: One challenge that I'm seeing some of our customers run into β€” as these models kind of get bigger and more important β€” is that the actual serving cost of the application increases. Is that an issue for you? Do you do things like quantization? Is lowering your inference costs an important project for you all? Cris: For sure. Yeah, for sure. I mean, we're running...our biggest cost right now is AWS, GPU costs, and inference costs, and serving these models. There are two main areas for sure. We have an HPC, we're doing large-scale training of language models and video models. That takes a lot of resources and time. But just serving on...I would say the tradeoff between precision and speed really matters. Quantizing models is great. But also you need to make sure that you're not affecting the quality of the model because if you're affecting something on a pixel level, it might change the result from being okay to bad. And that might mean user churning. And so, if you're going to spend a few more seconds rendering, that might actually be better. There's always a tradeoff of how much. But yeah, we always try to figure out what's the right balance there. We're still exploring some stuff on the browser. I think the browser is becoming really powerful. The only constraint about the browser is just memory and RAM. And you get...it's a sandbox, so you can't really do a lot of things specifically with video. But you can run some stuff on the browser. And so we would send some things specifically, and convert some things, and make them smooth enough. But I think we're not 100% there yet. Lukas: But you're also training your own large language models and large image models. That sounds like training would be a major cost for you as well. Cris: Yeah, for sure. Retraining some stuff to make sure it works in the domain of what we have is one of our core competences. Now we're training...starting a huge job on our HPC. That's going to take a big percentage of our costs for the next few months. Lukas: Wow. I have to ask. That language interface that you showed me was so compelling and cool. But I have been seeing language interfaces for the past 20 years, and the challenge with these language interfaces is when they don't work, they're just enraging. Actually, you sort of addressed that. Showing how it creates these things, and you can undo them, and you can kind of modify them. Do you feel that that kind of conversational interface is at the point where, for you, it's an interface that you really want to use? Cris: I like to think [of] it as a tool. It's not the sole answer to everything you need. This is not going to be a replacement for all of the workflows in making content, video, images, or sound, or whatever it is. It's just a speed up in the way you can do those kind of things. I think the sweet spot is a combination of both. Being able to have that constant feedback loop with the system, where you're stating something out [and] the system is reacting in some way that matches your idea. And then you have that level of control so you're going the direction you want and doing what you want. Or, if it's not working, you just do it yourself, right? I think a big mistake of research β€” specifically in the area of computer creativity β€” is this idea that you're going to automate it entirely. You see one-click off solutions to do X, Y, or Z. I think that's missing the bigger picture of how most creative workflows actually work. That probably means that you've never actually worked with an agency where the client was asking you to change things every single hour, or make it bigger, make it smaller, right? Lukas: Right. Cris: It's hard for me to imagine a world where you have a one-click off solution for everything. That feels boring, to be honest. You want to have that control. I think language interfaces are a huge step towards accelerating the speed at which you can execute. Are they the final answer for everything? I'm not sure, but they do make you move faster on your ideas. Lukas: Did I understand you right that you want to build your own large language model? I would assume you would take one of the many off-the-shelf language models today. Are you actually training your own? Cris: Yeah, I think it's...we are, but it's also the fact that ML...the infra for models and models themselves are becoming commodities. It's great for companies like us, because some stuff we kind of need to build on our own. There's a lot of things in Runway that you won't find anywhere else. But there's a lot of stuff, large language models that you can just use off the shelf. You have all these companies offering similar services. It's a great...as a consumer of those, if we want to use those, it's just a cost situation where whoever offers the best model, we'll use. And to a point, it might make sense to do our own. So yeah, sometimes we don't have to do everything ourselves. You can just buy it off the shelf. But some other times, you just need to do it because it doesn't exist. Lukas: Sorry, large language models you think you might do it yourself, even? Cris: We're doing a combination of both. We're using APIs but also re-training some of our own. Lukas: I see, I see. Have you experimented with all the large models out there? Do you do you have a favorite of the existing offerings? Cris: I think GPT-3 works. I think, actually, the model is Davinci. It's probably GPT-4 by now. I think OpenAI has been making- -right, right. Cris: -that silently behind the scenes, it works really well. That's the one I'd say we're experimenting with the most, and we get the best results. Lukas: Cool. Well, look, we always end with two questions. I want to make sure I get them in. The second-to-last question is, what is a topic that you don't get to work on, that you wish you had more time to work on? Or, what's something that's sort of underrated for you in machine learning right now? I realize it's a funny question to ask an obsessed ML founder. But I’ll ask it anyway. Cris: I think, audio generation. I think it's catching up now, but it's not...no one really has been paying a lot of attention. There's some really interesting open source models from Tacotron to a few things out there. I think that's going to be really, really transformative for a bunch of applications. We're already kind of stepping into some stuff there. But, it's hard to focus as an industry β€” or as a research community β€” in a lot of things at the same time. And now that image understanding has kind of been solved away, people are moving to other specific fields. I think one of the ones that are going to start seeing very soon is audio generation. So yeah, excited for that for sure. Lukas: Yeah, I totally agree. Do you have a favorite model out there? We just recently talked to Dance Diffusion, or HarmonAI, that was doing some cool audio generation stuff. Cris: Yeah, there's one β€” let me search for it β€” that just blew my mind. tortoise-tts, I don't know if you've seen that one. Lukas: No. Cris: Yeah. tortoise-tts is, I think, the work of just one single folk, James Betker. It works really well and he's been...someone used it to create the Lex Fridman...generative podcast. I'll share with you the audio. It's a whole podcast series that goes every week, where everything is generated. The script is generated by GPT-3 and the audio is generated by tortoise. And you can hear it's like, it's a podcast. You can't really tell. Yeah, really excited for stuff like that. Lukas: Cool. The final question is for you, what's been the hardest part about getting the actual ML to work in the real world? Going from these ideas of models or research to deployed and working for users. Cris: I think these models β€” and things like image generation and video generation β€” require a different mental model of how you can leverage this in creative ways. I think a big mistake has been to try to use existing principles of image or video generation and patch them with this stuff. I think, ultimately, you need to think about it in very different ways. Navigating a latent space is not the same as editing an image, right? What are the metaphors and the abstractions they need to have? We've come up with those before, in the software pipeline that we have right now. You have a brush, and a paint bucket, and a context or world tool, and you're editing stuff. But when you have large language models that are able to translate ideas into content, and you navigate and move across specific space or vector direction in ways you want, you need new metaphors and you need new abstractions. What's been really interesting and challenging is, what are those metaphors? What are those interfaces? How do you make sure the systems you're building are really expressive? I think two things that drive a lot of what we do are control and expressiveness. ""Control"" as in you, as a creator, want to have full control over your making. That's really important. How do you make it, so you also are expressive? You can move in specific ways as you are intending to do. So yeah, that's also really...it's really exciting and passionate for us to invent some of those stuff. Lukas: Well, it’s really impressive what you did. Thanks so much for the interview. Cris: Of course, thanks so much for hosting me. Lukas: If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So check it out.",6898,"In this podcast, Cris Valenzuela, CEO and founder of Runway, discusses the features of their cloud-based video editing tool that uses machine learning algorithms and systems. The tool simplifies and reduces the time-consuming and expensive processes of video editing, and allows users to create content on demand. Valenzuela also discusses the challenges of using machine learning at scale for real-time image and video editing, and the potential for multi-modal systems that can transfer between images, videos, and audio. The podcast also touches on the ethical implications of deep fakes and other manipulated content, and the importance of having control and constant feedback in creative workflows.
The podcast highlights the acceleration of technology in the field of machine learning and artificial intelligence for creative purposes. The speaker believes that image generation might be solved in a couple of months, and video is catching up. However, the industry is currently facing challenges with serving costs and balancing precision and speed. The podcast also discusses the combination of human creativity and machine learning, and the potential for audio generation to be a transformative application in the future. Overall, the podcast provides insights into the capabilities and challenges of using machine learning for video editing and creative workflows.","['What is the main difference between Runway and traditional video editing software?', 'How do professional video teams like The Colbert Show use Runway?', 'What are the challenges of using machine learning at scale for real-time image and video editing?', 'What are some of the cool things you can do with Runway?', 'Can you give an example of how Runway simplifies video editing?', 'How does Runway use machine learning algorithms and systems to simplify video editing?', 'What are some of the baseline models in Runway for video editing?', 'How can these models be combined to create interesting workflows and dynamics?', 'What is the background of Cris Valenzuela, the CEO and founder of Runway, and how did he get into machine learning?', 'How has Runway evolved over time to become more accessible to creative individuals?', 'Were the creators of Runway surprised by the success of their approach to generating images?', 'What are the challenges facing the industry in terms of serving costs and balancing precision and speed?', 'What is the potential for machine learning in creative workflows?', 'Will the acceleration of technology in machine learning and AI continue, or will new breakthroughs be needed?', 'What does it mean for image generation to be ""solved""?', 'Would Runway consider creating an image version of their video editing software?', 'What are the potential ethical implications of deep fakes and manipulated content, and how can they be addressed?', 'How does Runway approach the convergence of different media types in their product philosophy?', 'What have been the challenges of making the interface as responsive as it was?', 'Is lowering inference costs an important project for the company?', 'Do you feel that conversational interfaces are at the point where they are an interface that you really want to use?', ""What is the speaker's opinion on the combination of human creativity and machine learning?"", 'What are the challenges faced by the industry in using machine learning for video editing and creative workflows?', 'What is the potential for audio generation to be a transformative application in the future?', 'What is the work of tortoise-tts and who created it?', 'What are the challenges of using machine learning for real-time image and video editing?', 'What are the potential applications of audio generation in the future?']"
2,Jeremy Howard β€” The Simple but Profound Insight Behind Diffusion,https://www.youtube.com/watch?v=HhGOGuJY1Wk,4377,2023-01-05,"Jeremy: I’ve been telling everybody who will listen that I feel like we’re in the middle of a significant spike in technological capability right now. And so if you’re not doing that, you’re missing out on being at the forefront of something that’s substantially changing what humans are able to do. Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Jeremy Howard is the founding researcher at fast.ai, which is a research institute dedicated to making deep learning more accessible. They make an incredible Python repository that people use for lots and lots of deep learning projects. And they make an incredible set of classes that many people I know have taken, and is almost universally loved. He was also the CEO and founder of Enlitic, the president of Kaggle, and has done a whole bunch of diverse, amazing things in his career. It's always super inspiring to talk to Jeremy and this interview is no different. I really hope you enjoy it. Lukas: You are the first person to be on this podcast two times. And I think you are the most popular guest that we've had, based on our YouTube metrics. So it's great to have you. I wanted to start with, actually...the most memorable part of our interview β€” for me personally β€” was the amount of time that you set aside every day to work on just learning. Undirected, sort of learning new things, which I really thought was an amazing thing that I always aspire to do more of. I was curious. Lately, what have you been learning? Jeremy: I'm spending all my spare time at the moment on generative modeling, around the Stable Diffusion or diffusion modeling space. Lukas: Hence the new course, I guess. Is that part of the learning process? Jeremy: Yeah. It’s a chicken and the egg thing. It's partly ""the new course is because of the learning"", and partly ""the learning is because of the new course"". I've been telling everybody who will listen that I like feel we're in the middle of a significant spike in technological capability right now. And so if you're not doing that, you're missing out on being at the forefront of something that's substantially changing what humans are able to do. When there's such a technological shift, it creates all kinds of opportunities for startups, and for scientific progress, and also opportunities to screw up society. Which hopefully you can figure out how to avoid, and stuff like that. I'm very keen to do what I can to be on the forefront of that, and to help others who are interested in doing the same thing. Lukas: When you say ""spike"", do you mean diffusion models specifically or do you mean machine learning more broadly? Do you mean like- Jeremy: -I mean diffusion models, specifically. Lukas: Interesting, interesting. Jeremy: Yeah. It's a simple but profound insight. Which is that it's very difficult for a model to generate something creative, and aesthetic, and correct from nothing...or from nothing but a prompt to a question, or whatever. The profound insight is to say, ""Well, given that that's hard, why don't we not ask a model to do that directly? Why don't we train a model to do something a little bit better than nothing? And then make a model that β€” if we run it multiple times β€” takes a thing that's a little bit better than nothing, and makes that a little bit better still, and a little bit better still."" If you run the model multiple times, as long as it's capable of improving the previous output each time, then it's just a case of running it lots of times. And that's the insight behind diffusion models. As you'd be well aware, Lukas, it's not a new insight. It's the same basic insight that belongs to this class of models called ""boosted models"". Boosted models are when you train a model to fix a previous model, to find its errors and reduce them. We use lots of boosted models. Gradient boosting machines in particular are particularly popular, but any model can be turned into a boosted model by training it to fix the previous model's errors. But yeah, we haven't really done that in generative models before. And we now have a whole infrastructure for how to do it well. The interesting thing is that β€” having started to get deep into the area β€” I've realized we're not close at all to doing that in an optimal way. The fantastic results you're seeing at the moment are based on what, in a year's time, will be considered extremely primitive approaches. Lukas: Could you say a little more about that? Jeremy: Sure. Broadly speaking, we're looking to create a function that, if we apply it to an input, it returns a better version of that input. For example, if we try to create a picture that represents ""a cute photo of a teddy bear"", then we want a function that takes anything that's not yet ""a really great, cute photo of a teddy bear"" and makes it something a little bit more like ""a cute photo of a teddy bear"" than what it started with. And furthermore, that can take the output of a previous version of running this model and run it again to create something that's even more like ""a cute version of a teddy bear"". It's a little harder than it first sounds, because of this problem of out-of-distribution inputs. The thing is if the result of running the model once is something that does look a little bit more like a teddy bear, that output needs to be valid as input to running the model again. If it's not something the model's been trained to recognize, it's not going to do a good job. The tricky way that current approaches generally do that, is that they basically do the same thing that we taught in our 2018-2019 course, which is what we call ""crap-ification"". Which is, to take a perfectly good image and make it crappy. In the course, what we did was we added JPEG noise to it, and reduced its resolution, and scrolled[?] text over the top of it. The approach that's used today is actually much more rigorous, but in some ways less flexible. It's to sprinkle Gaussian noise all over it. Basically, add or subtract random numbers from every pixel. The key thing is then that one step of inference β€” making it slightly more like a cute teddy bear β€” is basically to ""Do your best to create a cute teddy bear, and then sprinkle a whole bunch of noise back onto the pixels, but a bit less noise than you had before."" That's, by definition, at least going to be pretty close to being in distribution, in the sense that you train a model that learns to take pictures which have varying amounts of noise sprinkled over them and to remove that noise. So you could just add a bit less noise, and then you run the model again, and add a bit of noise back β€” but a bit less noise β€” and then run the model again, and add a bit noise back β€” but a bit less noise β€” and so forth. It's really neat. But it's like...a lot of it's done this way because of theoretical convenience, I guess. It's worked really well because we can use that theoretical convenience to figure out what good hyperparameters are, and get a lot of the details working pretty well. But there's totally different ways you can do things. And you can see even in the last week there's been two very significant papers that have dramatically improved the state of the art. Both of which don't run the same model each time during this boosting phase, during this diffusion phase. They have different models for different amounts of noise, or there are some which will have super resolution stages. You're basically creating something small than making it bigger, and you have different models for those. Basically, what we're starting to see is that gradual move away from the stuff that's theoretically convenient to stuff that is more flexible, has more fiddly hyperparameters to tune. But then people are spending more time tuning those hyperparameters, creating a more complex mixture of experts or ensembles. I think there's going to be a lot more of that happening. And also, the biggest piece I think will be this whole question of, ""Well, how do we use them with humans in the loop most effectively?"" Because the purpose of these is to create stuff, and currently it's almost an accident that we can ask for a photo of a particular kind of thing, like a cute teddy bear. The models are trained with what's called ""conditioning"", where they're conditioned on these captions. But the captions are known to be wrong, because they come from the alt tags in HTML web pages, and those alt tags are very rarely accurate descriptions of pictures. So the whole thing...and then the way the conditioning is done has really got nothing to do with actually trying to create something that will respond to prompts. The prompts themselves are a bit of an accident, and the conditioning is kind of a bit of an accident. The fact that we can use prompts at all, it's a bit of an accident. As a result, it's a huge art right now to figure out like, ""trending on art station, 8k ultra realistic, portrait of Lukas Biewald looking thoughtful,"" or whatever. There's whole books of, ""Here's lots of prompts we tried, and here's what the outputs look like"". How do you customize that? Because, actually, you're trying to create a story book about Lukas Biewald's progress in creating a new startup, and you want to fit into this particular box here, and you want a picture of a robot in the background there. How do you get the same style, the same character content, the particular composition? It's all about this interaction between human and machine. There's so many things which we're just starting to understand how to do. And so, in the coming years I think it will turn into a powerful tool for computer-assisted human creativity, rather than what it is now, which is more of a, ""Hand something off to the machine and hope that it's useful."" Lukas: Do you think the same approach applies across domains? Or is there something about images β€” the way it's sort of obvious how to add noise β€” and maybe the data set that we have? I mean, certainly the way you described diffusion, there's a natural application to that to almost any domain, but- Jeremy: Correct. Lukas: -I guess Gaussian noise on text, it's a little unclear to me what that really means. Maybe it’s like... Jeremy: So, last week a paper showing diffusion for text came out. There's already diffusion models for proteins. There's already diffusion models for audio. The audio ones use β€” or some of them β€” use a fairly hacky obvious but neat approach of using diffusion to generate spectrograms β€” which are images β€” and then having something like a super resolution model. But it's not doing super resolution, it's doing spectrogram to sound. So yeah, these things are already starting to exist. They haven't had as much resources put into them yet, so they're still not that great. But yeah, that's the thing, Lukas, this is not just images at all. It'll be used in medicine, it'll be used in copywriting. The way we currently do generative text models, again, it's kind of a happy accident. When I did ULMFiT, the whole reason I created a language model was for the purpose of fine-tuning it to create a classifier. GPT then took that idea and scaled it up with Transformers. What Alec Radford was trying to do there was not ""generate text"", but try to solve other problems by fine-tuning it. There was this kind of discovery, almost, with GPT-3 that when you take this and you scale it far enough, it actually starts generating reasonable-sounding text. But the text is not necessarily correct. In fact, it's very often wildly incorrect. It'll...intentionally working on text generation approaches which are specifically designed for generating text is something that there's a lot of room to improve. Generally speaking, the way I see it is this. You've got a generative model that's trying to do something difficult and it's pretty good at it, or at least better than nothing. It'll be better at it if you can do it in a way that it runs multiple times during inference, because you're giving it more opportunities to do its thing. I think that means that these multi-step inference models β€” which may or may not be diffusion models, but kind of boosted generative models β€” are here to stay. Because no matter how good your generative model is, you can always make it better if you can find a way to run it multiple times. Lukas: I guess that is a good segue to another question I had, which is I think one of the really fun things about deep learning in the early days was it was so tangible. You have this fantastic class, where you can just kind of build these models and see how they work and play with them. I think we both have a very similar learning approach. But, one thing I've personally been struggling with, honestly, with these bigger models is just actually engaging with them in a meaningful way. It's fun to run the various image-generating models, but it feels kind of daunting. I'm not sure I have the money myself to buy the compute to make one that really works. We actually had one person on this podcast who did it for fun β€” Boris β€” which is a super fun episode, and I felt really jealous of how much fun he had building it. I'm curious how you turn that problem into something tractable, that you can actually engage with. Jeremy: Yeah. Well, Boris is one of our alumni. He's part of our fastai community, and he showed what is possible for a single, tenacious person to do. Lukas: Although I think Google donated like a hundred thousand dollars of compute to him. So it wasn't totally... Jeremy: Yeah, absolutely. If you can show that you're doing useful work, then there's plenty of compute out there which you can get donated to. But having said that, what he was largely trying to do β€” at least at the outset β€” was to replicate what OpenAI had done. I take a very different approach, which is I always assume that the best thing out there right now is far short of what the best thing could be. That in five to ten years time, there'll be something better, and I always look for improving that. So yeah, you should take our new course, Lukas- Lukas: I would love to. Jeremy: -which we're in the middle of, because what I've been working on is exactly what you describe. Which is, how to train and play with a state-of-the-art image-generative model in a notebook on a single GPU. As with all of these things, the trick is to start with an easier but equivalent problem. I'm doing all my work β€” just about β€” on the Fashion-MNIST dataset. Which, rather than being 512x512 pixel images of literally anything in the world, including artworks, in three channels, Fashion-MNIST is 28x28, single-channel images of 1 of 10 types of clothing. I always tell people β€” whether you're doing a Kaggle competition, or a project at work, or whatever β€” the most important two steps are to ""Create a rapid feedback loop where you can iterate and test fast"", and to ""Have a test which is highly correlated with the final thing you're going to be doing."" If you have those two things, you can quickly try lots of ideas, and see if they're probably going to work on the bigger dataset, or the harder problem, or whatever. It turns out Fashion-MNIST basically...I've kind of replicated a bunch of different approaches in the literature on Fashion-MNIST. The relative effectiveness of those different approaches on Fashion-MNIST mirrors basically exactly their relative effectiveness on COCO, or ImageNet, or LAION, or whatever. Lukas: Cool. Jeremy: But I can train a model on a single GPU to a point where I can see relative differences in about two minutes. Lukas: Wow. Jeremy: And that means I can very rapidly try things. I've started building notebooks where I show every single little step. And also, it helps a lot to use notebooks, which almost nobody working in the generative modeling field seems to be doing at the moment. What they do, is they have...the normal approach is to do ImageNet 64-pixel or CIFAR 32-pixel β€” which is still better than doing 512x512 LAION β€” but it still takes...ImageNet 64-pixel takes many hours on an 8-GPU machine. You can't do a fast iteration loop. In a notebook, I can run a single iteration of diffusion. I can see what the outputs look like because the pictures are all there in front of me. If you're not using this kind of approach, instead you're switching back and forth between a terminal, and then you need some way of actually viewing the images. And given that you're probably not sitting directly on that 8-GPU box, you're probably SSH-ing into it. So, now you've got to find a way to show those pictures. There are ways, by the way, of showing pictures in the terminal. For example, if you use iTerm2 there's something called imgcat. If you use other terminals, they probably support something called sixel, sixel graphics. But there's...they're not going to be as a good exploration environment for the kind of stuff than a notebook is. I think there's lots of opportunities for people like you and me to play in this field. I mean, I know there is because I've started spending time talking to some of the folks who were the primary researchers responsible for the key components of Stable Diffusion. And I'm already telling them things that they hadn't thought of before, by virtue of weird little experiments I've done with Fashion-MNIST on my single-GPU Jupyter Notebook. Lukas: Yeah, that makes sense. A fast feedback loop is so important. That's very cool. I was curious, broadly, if you have though on Stable Diffusion in general. We're sitting here in November 2022, and I think they've done an amazing job of bringing awareness to generative models. What do you think about Stable Diffusion? Jeremy: It's been great for progress in the field, clearly. Generally speaking, I'm all about democratization and accessibility, as you know. I don't love the fact that before Stable Diffusion was released, a small number of people in the world had access to the full generative models. And then other people could pay for cut-down versions of them, use them in small quantities. The thing is, accessing these things through a web-based API is extremely limiting. When you've actually got the weights, you can really play with both the engineering and the artistic side of doing things that no one's done before. So yeah, I think that's great. I think it's important. I think β€” as with any of these things β€” you release a new, powerful technology out there and a whole bunch of people are going to be using it for, you know, not necessarily the things that you would have chosen to use it for. For example, for Stable Diffusion, it seems like a very large percentage of people who are using it to generate lots and lots of images are doing it to generate anime and specifically nearly entirely...very young women with very few clothes on, anime pictures. I'm sure there are people out there who are taking the clothes off entirely. That happens, I guess, with any technology. I don't necessarily have...I mean, I guess you can't stop that happening. But we certainly need appropriate laws around at least making illegal things...make sure the things that we don't want to be legal, are in fact illegal. But yeah, there are obviously huge benefits. And you're not going to get stuff like protein diffusion models, or pharmaceutical diffusion models...none of those are going to develop if the technologies are in the hands of two or three big organizations. So it's certainly a very valuable step on the whole for society to have this stuff as open as possible. And to be clear, it was all trained at universities. The main one, most of the stuff we're using now for Stable Diffusion was trained in Germany, at German academic institutions, using donated hardware. Lukas: I guess it's interesting though that it was, I think, primarily ethics and AI considerations that made folks like OpenAI restrict access to their models. Or at least that's what they said. Do you think that you would know a priori that that was the wrong thing to do? Would you have pushed against that at the time? Jeremy: I actually wrote a blog post about that back when GPT-3 was just announced, and not released. Nearly universally, the feedback β€” at least from the AI community β€” was, ""Oh, this is lame. They're just doing it for profits."" In my blog post, I said, ""Well, not necessarily. There are genuine things to be thinking about here."" Which is not to say that that means that the motivation wasn't at least partially profit-driven. It might well have been. It's certainly convenient that the ethical considerations read in this way entirely align with profit-driven motives as well. But, like I say, it doesn't necessarily mean they're not true. And I'm pretty sure it's for both reasons. If you look at the way OpenAI has behaved since then, they've behaved in a way that is very increasingly apparently profit-driven. So, I'm less generous in my interpretation now than I was then, based on their continuing patterns of behavior. I think also with the benefit of hindsight, it feels a lot more like, in the last couple of years, companies keeping models to themselves, the main impact that ends up being is to create a bigger bifurcation between haves and have-nots in terms of capability. Requiring more researchers to pay for API access to do things, a decreased amount of openness, and in fact even what could be argued as being kind of deceitful behavior. For example, we now know that the OpenAI models that you can pay to access are actually not the same as what's been described in their research papers. We've now had dozens of people write research papers comparing various work to the OpenAI models, and now we've learned that actually we're not comparing to what we thought we were comparing at all. You know, thousands of hours of researcher time being wasted and papers being published with what turns out now to actually be totally wrong information. I'm definitely more enthusiastic about the idea of being open than perhaps...more confident about that than I was a couple of years ago. Lukas: Do you have thoughts on the language side of things, like large language models? Do you think that...for example, do you think that prompt engineering is headed to be an important way of doing machine learning? You do see these models doing incredibly well in a wide variety of NLP tasks. Better than models trained specifically on these specific tasks, sometimes. Jeremy: Yeah. I think generative text models have both more opportunities and more threats than generative image models, for sure. Like I say, they're kind of...the fact that they work at all is in some ways a bit of an accident. They're far, far, far from being optimized for purpose at the moment. But they're already amazingly good, particularly if you do this kind of stuff where literally there are now dozens of papers. ""Just look at what kind of prompts happened to work on these models that we kind of accidentally made generative models,"" ""let's think step-by-step"", and whatever else. We're starting to find ways to actually get them to do a little bit more of what we actually want them to do. But so far we're using really, really basic things. You know, all this ""instruction tuning"". So, rather than just feeding it the entire internet, let's actually fine-tune it with some examples of things that are actually correct info, that actually represent outputs that we would want for these inputs, rather than just whatever somebody rando wrote on the internet 25 years ago. My worry is...I'm much more worried about misuse of text models and image models, because it wouldn't be at all hard to create a million Twitter or Facebook or whatever accounts, and program them to work together to impact the world's discourse in very substantial ways over time. And nobody would know. We could have...on Twitter, for example, some fairly small number of accounts β€” often where nobody actually knows the human who's behind it β€” can have very substantive effects on what people are talking about, and how people talk about that thing. Imagine a million of those accounts, which were actually bots that had been trained to be more compelling than humans β€” which already for years, we've had bots which humans rank as more compelling than actual humans β€” and that they've been trained to work together. You know, ""Take alternate points of view in exactly the right way,"" and this bot gradually gets convinced by that bot, and whatever else. It could cause a very small number of people in the world to programmably decide how they want humanity to think about a topic, and pay to make that happen. Lukas: Although if I remember right, it seemed like all of fast.ai's sort of broad mandate was to basically make a no-code interface into machine learning, so anyone could access it. And it does sort of seem like prompt engineering β€” to the extent that it works β€” is like a huge step in that direction. Isn’t it? Jeremy: Right. Yeah, that's what I'm saying. That's why I said it's both got more opportunities and more threats. The opportunities are vast. Take, for example, the recent thing that was released last week or so, explainpaper.com. Where our students are already...so, with our course we look at a paper or two each week. Last week I had told the class, as homework to re-implement the diff edit paper. Students were saying like, ""Oh, I didn't understand this paragraph. So I highlighted it in explainpaper.com, and here's a summary it gave, and that's a lot more clear now. And then I tried to understand that bit, so I asked for more information."" This is very, very valuable. I saw somebody on Twitter a couple of days ago saying they don't really use Stack Overflow anymore, because they created this tiny little, simple little script called ""ask"" where they type ""ask"" and then something as a prompt β€” sorry, in the bash shell repl β€” and it would feed that off to OpenAI GPT-3, and return the result, and they basically use that instead of searching the internet nowadays. Lukas: Wow. Jeremy: Yeah. People are definitely using this stuff and it's going to get much, much better. Lukas: Do you have a clever way β€” like with Fashion-MNIST and image generation β€” to play with large language models on kind of a bite-sized scale? Jeremy: Not yet, no. I'll get to that, maybe, in another part of the course, I guess. It's definitely a great question and something to think about. Lukas: Interesting. Okay, a question that I need to revisit β€” because this is unexpectedly, I think, one of the reasons that so many people listened to my interview with you last time β€” you sort of made an interesting comment that you felt like Python wasn't the future of ML. You sort of said maybe Julia is the future of ML, and that really seemed to strike a chord with the internet everywhere. I think it's kind of the most-discussed part of Gradient Dissent of all time. So, I'm just curious. Do you have any more thoughts on that? Do you still believe that Julia is the future? You were sort of on the fence about that. Jeremy: I was on the fence about that last time we spoke and- Lukas: Totally. Jeremy: -I would say I'm a little less bullish than I was then. I feel like the Julia ecosystem and culture, it's so focused on these HPC, huge compute, running things on national lab machines. It's all stuff that's very appealing to engineers. It feels good, but it's such a tiny audience. I don't care about whether I can run something on 5,000 nodes. I just want to run it on my laptop. And it's still not great for running on my laptop, really. And it's not great for creating software that I can send you. I can't...if I created a little CLI tool or whatever, well, it's not great for creating little CLI tools cause it's so slow to start up. And then how the hell am I going to send it to you to try out? It'd be like, ""Okay, Lukas. Well, install the entirety of Julia, and then run the REPL, and then type this to go into package management mode."" And then, ""Okay, now you've got this thing and now you can run it."" It's like, okay, that's not going to happen. Or even just deploying a website, it's a lot of fuss and bother, and uses more resources than it should. It's still got that potential. But...I guess the other thing that's become more clear, though, in the last couple of years is their grand experiment on type dispatch...it is more challenging to get that all working properly than perhaps I had realized, because it's still not really quite well working properly. Good on them for trying to make it work properly. It's a vast research project. But there's a lot of weird little edge cases and trying to make that all run smoothly is incredibly challenging. I suspect...something needs to replace Python, but maybe it's something that doesn't exist yet. Partly though...what we're seeing instead...everybody knows we have to replace Python. So, what instead's been happening is we're using Python to create non-Python artifacts. Most obviously JAX. JAX uses Python β€” or a subset of Python β€” with a kind of a embedded DSL written as a library. Which only lets you create things that can be expressible as XLA programs, and then XLA compiles that to run fast on a TPU That works pretty well. It's very challenging, though, for research, or hacking, or learning, or whatever, because it's actually not Python that's running at all. So it's extremely difficult to profile β€” and debug, and so forth β€” that code. Very hard to run it really nicely in notebooks. In our little team working on diffusion models, we kind of all want to use JAX. But every time we try, it's always...because like everything I write is always wrong the first 14 times. And with Python, you know, I have 14 goes at making it better by finding all the stupid things I did. By running one line at a time, and checking things, and looking at pictures. With JAX, I wouldn't know how to fix my broken code, really. It's difficult. Lukas: But you don't think that that flexibility is fundamentally in conflict with making a language performant? I think we covered this last time. Jeremy: It is for Python. It is for Python, I think. For Python, that flexibility is to be able to actually run it as Python code. If you look at where PyTorch is going now, they've got this TorchDynamo stuff where they're working...they basically can interface with nvFuser, and you can interface with Triton, the OpenAI compiler-ish thing. I'm not sure exactly sure what you'd call it. Clearly PyTorch is heading the same direction as JAX. Which is, if you want it to run fast, you'll use TorchDynamo, or whatever it ends up being called. That's actually now integrated into the PyTorch tree. That's clearly where we're heading. And again, you end up with...probably you'll be using Triton. So you end up...Triton's amazing. Super cool, super fantastic. But you still end up with this thing that's running compiled code. It's not the same code you wrote, but a version of it. More difficult to hack on. If you look at how this works, there's a whole world of software that's written in languages which were explicitly designed to work this way. They're compiled languages. Languages like C++, and Swift, and Rust. They have something very nice, which is they have flags you can pass the compiler. You can pass that the -d flag to run it in the debugger, or you can pass the -o flag to run the optimized version. Basically, you get to choose how close the code that's actually running is to the actual lines of code that you wrote. So that for debugging, you can actually...it'll run slower, but it's actually running the lines of code that you wrote. And I think we want something like that, something that, ""Yeah, it looks like Python. It's pretty compatible with Python. You can still run it as Python, but you can also run it in an optimized way."" Maybe something that actually takes better advantage of these kind of type hints that we can provide. That's my guess. What's going to happen is we'll see Python-esque languages...we'll continue to see these Python-esque languages appear, that may begin to look less and less like pure Python, and are designed to work better and better with these backend linear algebra accelerators and compilers. Lukas: Is there some language out there right now that that has that feel for you? Jeremy: No, they're all basically these embedded DSLs. Like TVM or like Halide. We have the MLIR project, which is kind of providing the backend needed for these kinds of things. Chris Lattner has a new company, which presumably is going to be placed better than any other to create what we need for this kind of thing. He's the guy behind MLIR. It feels like a big open area to me, at the moment. Lukas: Interesting. Okay, on a totally different topic β€” that I kind of can't believe we didn't cover last time, I feel like we must have been right in the middle of it β€” I think I, along with many other people in the world, watched you advocate for wearing masks in the early days of COVID. I think you had some of the most high-profile articles on this β€” like the second-most popular on Preprints β€” and I was just kind of curious if you could sort of tell that story from your perspective. And maybe what you were seeing that other people were missing, and how you were kind of approaching that problem differently. Jeremy: It's hard for me, Lukas, because I don't understand why β€” and I still don't understand why β€” it's not reasonably obvious to everybody. Like, what's everybody else missing and why? Because from my point of view...well, okay, let me go back. So, February 2020 β€” mid-ish February 2020, late February 2020 β€” I had a course coming up at the University of San Francisco that I was going to be teaching. I had heard increasing chatter about this Chinese virus thing. What then happened was it hit Italy, and there was a lot more information in English about what was happening in Italy, than there was what was happening in China. So it suddenly was much more accessible to see what was going on, particularly because a lot of the Italian doctors were actually on Twitter and stuff, so you could read what was happening. A whole bunch of people were saying like, ""This is a disaster"", ""The president of the Italian medical body just died of COVID,"" and, ""There's not enough hospital beds."" I knew it had kind of just started to get detected in New York. I thought, ""Oh, well, it seems like it might be quite likely to come here. What does that mean for our course?"" Not at all altruistic. Just, like, are we still going to do our course? My wife and I kind of started reading about it to try to figure out what should happen with the course. And as we did, we were...yeah it was very obvious that it was going to be a global pandemic and it was going to sweep through San Francisco within weeks. And so like within two days, I wrote an email to everybody who had registered to the course, and put out a blog post, and said we're not doing the course live. We're going to do it virtually. This is well before our university β€” or I think any university β€” had decided to do that. Which again, I already thought was weird. Like I thought, ""Okay, it's not yet here, but obviously it's going to be. So why are people acting as if it's not going to be?"" Rachel and I ended up writing a long blog post. We were kind of like, ""Okay, it's not just our course."" We've got all these friends in San Francisco who are doing things that we're pretty sure they're going to look back on in hindsight and think, ""That's a terrible idea, because I put myself and my community at risk."" So we said...we didn't know much about it, so we just said, ""Look, as data scientists, here's what we can see so far in the data. It does seem to grow exponentially, at least at first. And, you know, this is the impact it's been having in Lombardi. Here's the early impact in New York. Here's how the math of these kinds of things work. Here's not just a prediction, but an almost certainty as to what's going to happen here."" That got a lot of attention. We had no idea how to avoid it ourselves. We were worried that...historically, when there is global pandemics, it can lead to violence. It can lead to societal disharmony, or whatever. We decided to get out of San Francisco for a while. We also...it was clear that there was going to be a lockdown at some point because, I mean, why wouldn't there be? Again, none of our friends seemed to believe any of this is going to happen. It's really...I thought it was weird, it just seemed very obvious. And then yeah, there was a lockdown like a week or two later. We had told our daughter's school, ""Oh, there's probably going to be a lockdown."" They sent back this rather annoyed email about interrupting learning or something. The schools were closed for a year in the end, in San Francisco. Then we were like, ""How do we not get COVID?"" Because we probably don't want to get COVID, because it seems like getting COVID can be bad. We started to hear from people who would like...saying maybe there could be longer-term implications of some of these kinds of SARS viruses. So I started looking into how it spread, and I discovered that there's all these countries around China that had avoided getting hit by COVID. Particularly Hong Kong, that's literally a train line away from Wuhan. And that just seemed amazing, you know. That's when I discovered that Mongolia, Taiwan, and Hong Kong all had this either universal mask policy or universal mask usage, kind of culturally. And I thought, ""Oh, that's weird."" Because I thought masks were this kind of weird thing. For some reason, you go to Chinatown, you see people wearing masks and that's how it's is, and that's weird. I didn't give much notice of it. But then I started learning it was this respiratory infection, and it kind of started to make sense. I wrote something in the Washington Post talking about how in the Czech Republic, particularly, the populace had independently decided to wear masks, heavily driven by a popular science YouTuber. Basically, within like three or four days, the whole country had made enough masks for everybody, and their president was talking about how proud he was. Again, their infection was going the opposite direction to other countries, I thought that was interesting. So yeah, I kind of wrote an article about that. I talked to a guy who used to be very high up in the government on the science policy side, and I asked him what's going on with masks. He said like, ""Well, nobody thinks there's very convincing science about it."" He said if you want to convince people to wear masks, then you need to find some better science. So I contacted basically the 18 smartest scientific researchers I knew, everybody from Lex Fridman to Zeynep Tufekci and said β€” not just scientific researchers, in Zeynep's case a sociological researcher β€” and said like, ""Do you want to help me put together the evidence?"" That's where our paper came from. Basically, everybody said yes, they all agreed. Suddenly we had this huge author group, so we kind of set up a Slack channel. None of us had a really strong opinion going in. Had one of the world's best aerosol scientists, he was probably the strongest opinion going in because this is his job. He was like, ""Well, let me explain aerosols to you."" Then what happened was there was this amazing couple of papers that actually used this laser-scattering light chamber thing to actually literally take videos of respiratory particles suspended in the air. Not suspended, but they just float in the air. It showed that they float in the air for up to an hour. And it showed that when somebody wears a mask, they don't appear. That was the point where I went from ""curious and interested"" to ""100% convinced"". Because it'd be like if somebody said, ""I promise you, Lukas, if you throw this ball at that wall, it won't bounce off. It will go through."" You'd be like, ""Well, Jeremy, I'm not sure. But I'll give it a go."" And you throw the ball at the wall, and it bounces off, and you go like, ""Jeremy, I am very sure you're wrong about your theorem."" And that's how it was with masks. There were people who said masks don't provide respiratory protection from these airborne particles, and then here's a video of them not going through the mask. I was like, ""Okay, that's...I don't need any RCTs. There's a video. There's a picture of it working."" I kind of went all in on just trying to say to people, ""No, there's actually a thing that stops the thing that infects us. So we should wear them."" I found it extraordinarily bizarre that everybody didn't just go, ""Oh, look at that video of it working. Therefore, it works."" It was a super frustrating experience. I don't...there's nothing I enjoy about researching masks and there's nothing I enjoy about political advocacy. The former is boring and the latter is stressful. But when there's something that so obviously can save millions of lives β€” and also can avoid who knows what long-term harm β€” it just seems absolutely ethically required to act on that. I spoke with all kinds of world leaders, and politicians, and celebrities, and whatever. In every jurisdiction, it was like a whole new conversation. It was like, ""Talk to people in South Africa; 'Oh, we don't believe in masks.'"" It was like, ""Talk to people in London; 'we don't believe in masks'. Talk to people in Australia; 'we don't believe in masks'. Talk to people in Florida; 'we don't believe in masks.'"" Each one, I discovered this horrible thing. Which is everybody decided they didn't believe in masks until their personal jurisdiction got hit hard by COVID. Wntil the hospital started filling up. And then they would get back to me and say like, ""Oh, tell me more about this mask thing, Jeremy."" That was infuriating because of course the answer is, ""Well, if you had put in mask mandates two months ago, then this wouldn't have happened. Now it's too late because masks can reduce R by a bit, but not enough to reverse a full-on pandemic, once it's there."" Honestly, it...I got really burned out by the process. In some ways it was successful, but in the end, the pandemic still happened. And in the end, I'm still flabbergasted, particularly now that high-quality medical masks are widely available. Demand is so low that factories have been shutting down. I've never had COVID. Literally nobody I know who has worn a high-quality mask at all times indoors, none of them have got COVID. And everybody I know who doesn't, have all had COVID. There's a point at which you kind of say, ""Okay, I've done what I can. You do you."" Lukas: So you continue to wear a mask indoors, at all times? Jeremy: Of course. Yeah. Lukas: What would change...when would you stop wearing a mask indoors? Jeremy: I suspect it's the same as the answer to the question, ""When would I stop drinking clean water?"" I'd rather keep drinking clean water. We decided...I mean, remember, it took decades β€” even after the John Snow experiment β€” for big cities to decide to invest in clean water infrastructure. Presumably after some number of years, we will invest in clear air infrastructure. China's already done it. They now have, I believe, HEPA filters in pretty much all their public buildings, and they're putting in UV sterilization in pretty much all their public buildings. Hopefully, at some point, the West will do the same thing and then it'll be like, ""Okay, I'm in an environment with clean air,"" so I don't have to self-clean the air. That'd be one option. Another would be...again, China's ahead of us on this. They have nasal vaccines, which are probably much more effective. If we eventually get those, I think they can actually make a significant dent on transmission. The injected vaccines don't make much of a big impact on transmission. So yeah, there are technologies that should allow us to be able to be pretty safe in indoor spaces. Lukas: But you don't wear masks in an outdoor space? Is that the... Jeremy: No, I mean, it's not exactly a hard and fast rule. We went to a birthday party recently, for example, where it was a karaoke thing. It was outdoors, but all the kids were singing, and they were tightly packed, and whatever. So, our family wore a mask because there's a high amount of aerosolizing activities going on with a high density of people. But yeah, broadly speaking, I'm not too concerned about outdoors because the airborne particles disperse much more quickly. Lukas: I see. I guess the interesting thing about that story maybe is that there maybe was a fairly broad scientific consensus, but no one was really ready to advocate for it. Is that a better summary of what was happening? If you got all these scientists together and they actually all agreed with what you were saying... Jeremy: They didn't, unfortunately. What happened was it was highly polarized by areas. The people that actually understood this are the aerosol scientists. And the aerosol science community was basically 100% all on the same page. Like, ""Talking, breathing, these are aerosolizing activities. We have loads of evidence that this is transmitted through aerosols. We have loads of evidence that in the droplet nuclei β€” that are suspended in the air β€” masks block those from getting to your lungs."" All those were pretty much understood in that community. But then the challenge is, Lukas, that we haven't had a major respiratory pandemic in the West, really, since the Spanish flu. So, none of our infectious disease community has any background in that. I spent a lot of time advocating β€” including speaking directly to the WHO's infection control groups, the folks who kind of ran the response at the WHO β€” and they were overwhelmingly people who had a background in infectious diseases that was bred through contact. The kind of stuff that hand washing helps with. So they were just coming from a totally different direction, and had decades of experience on treating different kinds of diseases in a different way. They were doing their best to learn and understand. But for some, that was a very difficult experience. One in particular, John Conly, his financial stake was very high in this fomite transfer. That transmission is not through the air, but by contact, because he has financial interests in that being the case. So, very difficult for him to come to terms with the idea that this is a respiratory infection, through respiratory particles, requiring respiratory protection. That was a big challenge, this worldview difference between different scientific groups. The aerosol scientists, there were actually none of them on the WHO's infection protection committee...infection control, whatever it was. I noticed β€” when I was talking to WHO β€” it was a total lack of diversity. Every single one had the same kind of academic background, and the same way of thinking about things, and they all knew each other very well. They were also...being involved in the WHO is a very strong status signal in their career, so everybody wants to be invited to those kinds of things. And so you really want to have all the other people on the committee think you're a good, nice person. It creates this real monoculture. So that was another big part of the problem. It was all...it definitely made me a lot more cynical than I was before it, to see how the WHO works. And even our big paper, how to get it published. It took a year from being written to being published. By the time it was published, it was basically too late. The process of getting it published was much more about politics than about science, you know. It was disappointing for me to discover that systems that I had thought of as being very much focused on rationality and data and correctness and rigor...so much of it turned out to be about politics, and networks, and stuff. I guess I was probably pretty naive before all that happened. Lukas: My sense is that people broadly believe that masks reduce the spread of COVID at this point. I'm not sure that I know exactly to what degree...it sounds like you're saying to a really massive degree. But I think you had a part in that. Or maybe just...I just follow you on Twitter and we were just watching you talk about it. But I don't know. It does seem like it’s the mainstream... Jeremy: Yeah, I mean, I was leading the Masks4All group globally. We were the most substantive group doing that. Absolutely. Lukas: It feels like it was successful, though. I mean, I just...do you not... Jeremy: It was successful-ish. If you're in San Francisco, it'll look more successful than if you're in Australia, for example. In Australia...from time to time, we've had mask mandates and everybody wears them when they're told to. The rest of the time, it's strongly recommended, but nobody does. But in San Francisco, I'm told maybe 30% of kids at schools β€” or some schools β€” are wearing them. It's definitely...it's disappearing. And also people β€” a lot of people, maybe most people β€” I see wearing masks, at least in Australia, are wearing masks that don't work very well, even though the good masks are really easy to get. And a lot of people don't realize like if you get a high quality N95 respirator, you could wear that as many times as you like, until the straps wear out. A lot of people think, ""Oh, you can only wear it once."" A lot of people think it has to be fit-tested. A lot of people think it's like donning and doffing is some complicated thing. There's all this wrong information out there. And so the number of people actually wearing high-quality masks is...to me, it's surprisingly low. If everybody wore one whenever they were indoors, I think we might...particularly if we also had HEPA filters in indoor spaces, I suspect we would be done with a virus, that it would go away. Because how would a respiratory virus continue to transmit when you break the flow of respiratory particles? Yeah. I mean, even in China. All the pictures I see, everybody's wearing surgical masks. It's, like, weird to me. Lukas: Interesting. Well, look, we're almost out of a time and we always end with two questions. But you're a little bit of an unusual guest, I don't know exactly how all these will fit your worldview. We like to...I like to ask people, if you had some extra time to research something completely different, what might it be? I feel like you are just an unending font of this stuff. What are some things that you're interested in that you haven't had time to look into? Jeremy: Well, I'll answer a slightly different question because any time I'm interested in researching something, I just do. Lukas: Fair enough. Jeremy: The most recent thing I spent a lot of time researching is children's education. Our daughter missed the first year of school. Because of COVID, in San Francisco they were closed. That would have been her kind of transitional kindergarten year, as they call it in California. Then we came to Australia, and so she went to school β€” regular school β€” for the first year here. She was straight into grade one. She enjoyed it. She was always happy to go, and happy to stay there. But it felt like she had blossomed a lot more during her previous year when she was doing stuff over Zoom, and on apps, and stuff than the year that she was in-person in the classroom, which really surprised me. Instead, she had become much more of a perfectionist and was becoming much less resilient after her year at physical school. That all seemed really weird to me, because I thought that environment would be much more healthy than the previous one. I started investigating it really carefully and studying a lot of academic papers about education. I was stunned to discover that there's pretty broad consensus in parts of the academic community β€” or some very strong data β€” that suggests schools are not a particularly great place for most kids to really blossom, or at least entirely focus on school learning. In fact, tutoring...kids who get tutoring are in the very top, highest academic performers regardless of their previous background. It seems like all kids can be really successful given the right tutoring. Our daughter was doing all this stuff with apps, and on Zoom, and stuff during her first year. None of that is limited by the speed at which a teacher thinks a kid should go, but instead the computer is dynamically adjusting difficulty over time. So, weirdly enough, our daughter was basically at Grade 4 or Grade 5 of math after a few months of doing these apps. They're so much more effective than normal teaching. We were also trying to figure out, ""Well, how do you avoid her getting really bored and stuff?"" So I did this really deep dive into education and discovered there's all these fascinating, different ways of teaching and learning which are entirely different to what's done at normal schools. Eventually, we decided to take her out of school and instead switch to using these kind of more academically driven approaches in a homeschooling environment. Which also seemed to generally lead to better social outcomes, better mental outcomes β€” better mental health outcomes β€” and better learning outcomes. That's kind of been interesting to me, to discover this whole world of research that seems really important, for humanity. How kids should learn. It feels like, again, it's being largely ignored by the institutions that we send our kids to. Lukas: Let me just see if I got the summary of this: basically that tutors are much more effective than schools at actually teaching kids things. Is that what you’re saying? Jeremy: That would be part of it. But there's lots of...that's kind of one starting point. Yes, even kids that would otherwise have been doing pretty badly at school can be in the very top performers. That kind of is an existence proof, that pretty much all kids can be extremely successful. But then there's also this kind of interesting data point for us, which is when we gave our daughter an iPad, and some math and reading apps, and somebody on the other end of a Zoom to supervise them, she had a huge amount of fun and learned dramatically more quickly than I thought was possible. And then when she actually went to school, she basically learned nothing for the whole year and ended up becoming much less resilient. There are specific ways of learning that are not particularly compatible with the normal ways we teach at school. For example, we might have talked before about Anki and repetitive spaced learning. My daughter does Anki every day. Literally everything she learns, she will remember forever if she creates a card for it, or she decides she wants to know it. That's kind of quite difficult to do at a normal school because you'd need all of your grade levels to be doing Anki. So that in Grade 5, you've still got cards from Grade 1 or Grade 2 coming back. But what happens at school is each year...for example in Australia, the Year 7 and Year 8 math curriculums are nearly entirely a refresh of the primary school curriculum, because they kind of assume the kids are going to need to see it again, because they've probably forgotten a lot of it. Things like, ""How would you incorporate spaced repetitive learning?"" Some schools in England have tried to do something like that using something they call ""retrieval practice"". I know there's a school called the Michaela school, which I believe had the highest results academically in the whole country. They do something like this. There's a few...there's a handful of schools here and there which are trying to use these kind of research results. But they're kind of the odd ones out. Lukas: All right. Finally...I don't know if this one really applies to you. We usually ask β€” because my company, and this interview, is all about making machine learning really work in the real world β€” we usually ask like what's a hard part that you've encountered in taking something from research to actually working for some purpose? That may not exactly apply to you, but you seem very good at sort of interpreting my questions in a useful way. So I pose it in its most abstract form. Jeremy: I mean, I've had lots of projects that I've tried to bring into the real world. Lukas: Of course, that's right. Yeah. Jeremy: It's difficult. I've been doing machine learning projects for over 25 years now, believe it or not. In the early days, it was such a challenge because managers didn't believe in the power of data at all. When I would try to tell them that it could be really valuable, they would always say like, ""Can you point to a role model of a company that's been successful because of their use of data?"" And there were none. That was tough. Lukas: Yeah. Jeremy: Then Google came along, which was great, because then I could point at this one company that was really working hard to use data and they've become very valuable because of it. Nowadays that bit's a lot easier. But actually, unfortunately, my answer is going to be that I've kind of β€” for a lot of companies β€” I've given up on even trying. Because I tried to get...particularly when I was at Singularity University, where all of our students were basically execs from giant companies. We were trying to convince them to be more data-focused and some of them really took that on board. And then they would invite me to come and talk to their VP groups and exec groups. I saw lots of big companies try to get more data-driven, try to use machine learning. I didn't see any being successful. The issue seemed to be that their entire management teams were people who...that was not their area of expertise. They were not promoted because they were good at that. They would have very smart, data-driven people down in their kind of business analyst levels, that they would have no idea which ones knew what they were talking about, and have no way to kind of curate what they were being told. All of the promotion systems were based on experience, and credentialing, and things other than analytical capabilities. So, in those kinds of companies, I eventually decided, ""Okay, maybe it's not possible for a legacy company to become a data-driven company."" And so nowadays I focus all of my attention on startups created by founders that are already data-driven and have a good understanding of analysis. What we're seeing is, increasingly, the most valuable companies β€” or particularly the most valuable companies in America β€” they're basically all now ""tech startups"". I mean, they're not startups anymore, but they're all companies that are created by engineers and data-driven people. I think for data scientists interested in making an impact, the best thing to do would be to try and make sure you're at a company where that kind of work is appreciated and understood by the executive team. Lukas: Interesting. Well, great to talk to you. That was super fun. Thanks for- Jeremy: You too, Lukas. Lukas: -answering my wide range of questions. It's always so inspiring to talk to you. I really appreciate it. If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So check it out. Jeremy: And how is everything going at Weights & Biases? I always hear nothing but good things about it. Everybody loves it. I've got to admit, actually, the other day I was talking to my friend β€” I think it was Tanishq β€” about like, ""Oh, what's going on with this learning rate here? I wonder if it's working properly."" And then he's like, ""Well, here's a graph of the learning rate."" I was like, ""Oh, that was quick and great. Where did that come from?"" He's like, ""Weights & Biases, it logs it."" Lukas: Yes! Oh, man. Are we still recording? Put that on the... Jeremy: I probably should have looked at the Weights & Biases team. Here I was with like ""plot.plot(x = ...)"", and he's already got it pasted into the Discord chat. Lukas: All right. Well, that made my day. Thanks. Jeremy: Cheers, mate.",10657,"In this podcast, Jeremy Howard, a founding researcher at fast.ai, discusses the spike in technological capability and the importance of being at the forefront of this change. He focuses on generative modeling, specifically in the Stable Diffusion or diffusion modeling space, and explains the profound insight behind diffusion models and how they can be used to improve previous outputs. The podcast also discusses the challenges of engaging with bigger models and the need for resources to make them work.
The conversation then shifts to the potential of language models and the risks of their misuse, such as creating bots to impact the world's discourse. Jeremy also discusses his frustration with the lack of belief in masks and the slow adoption of mask mandates during the COVID-19 pandemic. Despite a scientific consensus on the effectiveness of masks, there was a lack of advocacy for them. The Masks4All group, led by Jeremy, was successful in promoting the use of masks, but their effectiveness varies depending on location and the quality of the masks being used.","['What is the significance of the spike in technological capability that Jeremy Howard mentions?', 'What is Jeremy Howard currently learning and why is it important?', 'What is the profound insight behind diffusion models and how do they work?', 'What is the purpose of creating a function that returns a better version of an input?', 'How do current approaches deal with the problem of out-of-distribution inputs?', 'What are some of the challenges in using prompts to create outputs with generative models?', 'How do you customize the style and content of a story book using computer-assisted human creativity?', 'Is diffusion modeling only applicable to images or can it be used in other domains such as medicine and copywriting?', 'How can one engage with bigger models in a meaningful way and make them work without having to spend a lot of money on compute?', ""What is Jeremy Howard's approach to improving generative models?"", 'What is the importance of creating a rapid feedback loop and having a highly correlated test in improving models?', 'What is the advantage of using notebooks in generative modeling compared to other approaches?', 'What are the benefits and limitations of accessing web-based APIs for generative modeling?', 'What are the ethical considerations surrounding the restriction of access to models by companies like OpenAI?', 'What are the opportunities and threats associated with generative text models, and how are they currently optimized for purpose?', 'What are the potential risks of the misuse of text and image models?', 'How is fast.ai making machine learning more accessible to people without coding experience?', 'Is Julia still considered the future of machine learning, according to Jeremy Howard?', 'What are the challenges of using Python for creating software?', 'What is JAX and how does it work?', 'How do compiled languages like C++, Swift, and Rust differ from Python and JAX in terms of debugging and optimization?', 'What Python-esque languages are designed to work better with backend linear algebra accelerators and compilers?', 'Is there a language currently available that has the feel of what is needed for this kind of thing?', 'Can Jeremy Howard tell the story of how he advocated for wearing masks in the early days of COVID and what he was seeing that others were missing?', 'How did the Masks4All group come about and what was their goal?', 'What evidence convinced Jeremy Howard of the effectiveness of masks in preventing the spread of COVID-19?', 'What challenges did Jeremy Howard face in promoting the use of masks during the pandemic?', 'Why was there a lack of belief in masks during the COVID-19 pandemic despite scientific consensus on their effectiveness?', 'What technologies could allow us to be pretty safe in indoor spaces without wearing masks?', 'Was there a broad scientific consensus on the effectiveness of masks during the COVID-19 pandemic?', 'What challenges did the aerosol science community face in advocating for the use of masks during the COVID-19 pandemic?', ""What was the main issue with the WHO's infection control committee and their response to the pandemic?"", 'How successful was the Masks4All group in promoting the use of masks, and what factors affect their effectiveness in different locations?', 'What is the effectiveness of high-quality masks in preventing the spread of respiratory viruses?', 'What are some alternative approaches to traditional schooling that have been found to be effective?', 'How can tutoring and technology be used to improve learning outcomes for children?', 'How can spaced repetitive learning be incorporated into traditional schooling methods?', 'What are the challenges of implementing machine learning projects in legacy companies?', 'What is the best way for data scientists to make an impact in their work?', 'How is everything going at Weights & Biases?', 'What is the importance of being at the forefront of technological change?', 'What are the risks of misuse of language models?']"