Gladiator commited on
Commit
6db1eed
β€’
1 Parent(s): 10b23b5

upd for a big run

Browse files
Files changed (1) hide show
  1. data/summarized_podcasts.csv +7 -0
data/summarized_podcasts.csv ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ ,title,url,duration,publish_date,transcript,total_words,summary
2
+ 0,Sarah Catanzaro β€” Remembering the Lessons of the Last AI Renaissance,https://www.youtube.com/watch?v=v3O20NMdOuA,4584,2023-02-02,"Sarah: I think people see the output of models like DALLΒ·E, GPT-3, et cetera, and they're amazed by what AI can do. And so the conversation doesn't even hinge on, ""We have access to this data set,"" or ""We have access to this talent pool."" It's more, ""AI is magical. What can we do with it? Come in and talk to us about this."" And again, I think that that is somewhat dangerous. Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Sarah Catanzaro was a practicing data scientist and then went into venture. She's currently a General Partner at Amplify Partners, and one of the leading investors in AI and ML. Her investments include a whole bunch of companies I admire, like RunwayML, OctoML, Gantry, and others. It's really interesting to talk to an investor who's also technical. She has insights both on how the technology is built and how it's being adopted by the market at large. This is a really fun conversation and I hope you enjoy it. Sarah, thanks so much for doing this. I've been looking forward to this one. I had a bunch of questions prepped and then I was looking at your Twitter and I was like, ""Oh, there's like a whole bunch of stuff that we should..."" Sarah: Yeah. I feel like I've been doing a lot of thinking out loud recently. Including in response to a lot of the hype around Stable Diffusion, LLMs, et cetera. I appreciate the fact that both of us were there in the 2013, 2014 phase where every company was claiming to be an AI company. It feels like we're kind of heading down that road again, which scares me a little bit. I hope at least there are enough companies β€” people β€” who remember the lessons of the last AI renaissance. But we'll see. Lukas: Well, let's get right into it then, because from my perspective, I totally remember at least one other AI bubble. Maybe more, depending on how you count it. I guess from where I sit, it feels like this one might be different in the sense that I feel like these challenges that were always...seemed super, super hard, seem like they're really working. And I feel like I see applications happening unbelievably fast after the paper comes out. Actually even maybe before there's time to even publish any paper on the topic. I think I might be more bullish about large language models and Stable Diffusion than you, which is great because we can actually have an interesting conversation here. But I thought it's interesting. You've invested in Runway, and just the other day Cris was showing me a natural language input into Runway where you could basically type what you want, and it would sort of set up the video editing to work that way. I thought, ""Oh my gosh,"" this might be a totally new kind of interface that lots of software might quickly adopt, I guess. But it sounds like β€” looking at your Twitter β€” it sounds like you were playing with large language models and finding it super frustrating and broken. Tell me about that. Sarah: Yeah, so I think my concern is less about the capabilities of large language models specifically, and more about some of the lessons that we learned during the last AI renaissance. Which I think was roughly like 2014 to maybe 2017, around the time that AlphaGo came out. People were really excited about the capabilities of GANs and RL. At the time, I remember companies like Airbnb, Uber, Lyft building these big research teams, but not really having a clear agenda for those research teams, or understanding how the objectives of their research teams might align with the objectives of the broader organization. And then similarly, you saw all of these startup founders emerge that were talking about changing healthcare with GANs or changing finance with RL, but didn't really have insights into the nuances of those industries. My feeling of why ML didn't work the last time around β€” or rather, why ML adoption didn't occur at the pace that we anticipated β€” was that it was not really a technical problem, but rather a product, go-to-market problem. I am hoping that this time around, we've both learned from our mistakes but also β€” in the intervening time period β€” created enough enabling technologies, such that two things can occur. One is that companies can fail fast. Frankly, one of the things that scares me is that back then I remember a bunch of companies reaching out and basically saying things like, ""Hey, we've got a bunch of data. We'd love for you to come in and talk to us about our AI strategy,"" and thinking, ""I don't care if you have a bunch of data. Let's talk about a bunch of problems that you have, and how ML can solve those problems."" I've come to believe that you can't fight that urge. Founders will always be enticed by the promise of AI. But if they're able to experiment with it quickly, then I think they can start to learn more about the infrastructure, and data, and other investments that they may need to make in order for their AI initiatives to be successful. At the same time, I think by creating these higher-level interfaces that make ML more accessible to potentially the domain expert, it allows people with a more thorough understanding of business problems to at least prototype AI solutions. I'm somewhat skeptical that these very high-level interfaces will allow them to build production ML at scale, but at least they can see, ""Does it work? Do I need to now hire a data/ML team to realize this initiative further?"" Lukas: Do you have companies in mind that you like, that are creating these higher-level interfaces off of ML technology, that makes them usable for real world applications? Sarah: Yeah. I think Runway is actually a perfect example of the phenomena that I see playing out. Some people may not know, but Runway actually started off more as a model marketplace. Their goal had been to make GANs and other types of models accessible to creative professionals, but they weren't really focused on building out the video editing tools, at least initially. They created these higher-level interfaces, such that various creative professionals β€” whether it was artists, or directors, or photographers β€” could start to experiment with ML models. What they saw was that some of the most popular models were models that automated routine tasks associated with video editing. Based on that user behavior, they decided to double down on video editing. In fact, a lot of the model architectures that they've since created β€” including Stable Diffusion β€” were really purpose-built to support the workflows of video editors. I like that sort of workflow, where you use a prototype, or you use these higher-level interfaces to get insight into what users need β€” as well as potentially the limitations of the underlying technology β€” and then you iterate from there. Lukas: I totally remember a time, I think, of the era you're talking about β€” 2014 to 2017 β€” when every company was like, ""Oh, we have this data. it must be valuable because we can build a model on top of it."" Do you see some analogy today to that? What's the common request of an ML team that's misguided, or should be thinking more about problems? Because I feel like data maybe isn't seeming quite as valuable, in the world of LLMs and big models. Sarah: I think that what we're seeing today is arguably more nefarious than what we saw back then, because at least at that point in time, companies had invested in collecting data. They had thought about possibly what data to collect. And so there was some understanding of how to work with data. I think people see the output of models like DALLΒ·E, GPT-3, et cetera, and they're amazed by what AI can do. And so the conversation doesn't even hinge on, ""We have access to this data set,"" or ""We have access to this talent pool,"" or ""We have this type of workflow that could benefit from these generative capabilities."" It's more, ""AI is magical. What can we do with it? Come in and talk to us about this."" And again, I think that that is somewhat dangerous. I was at a conference just last week. There was a presentation on ML infrastructure at a music company, and somebody in the audience asked, ""Does the AI listen to songs?"" It's a perfectly reasonable question. But I think it does kind of belie some of the misunderstanding of AI and how it works. Lukas: In what sense? Sarah: I think people think about AI as artificial agents. They think of AI as something that could listen to a song, not just something that could represent a song and make predictions based upon the content of that song. Again, I think better understanding of what LLMs are and what they can do will be really necessary to identify when they can be useful. Lukas: This might sound...this is a little bit of a soft ball β€” or might sound like a soft ball β€” but I was really genuinely interested in this. I feel like one of the things that you do really well, at least in my conversations with you, is maintain a pretty deep technical and current knowledge of what's going on in data stacks, basically. Or, data infrastructure and ML infrastructure. But yet you're not maintaining data infrastructure β€” as far as I know β€” so I'm kind of curious how you stay on top of a field that seems like it requires such hands-on engagement to understand it well. Or at least I feel like it does for me. Yeah, just curious what your process is. Sarah: Yeah. It's interesting because I'd say that, in some ways, that is one of my biggest concerns. I've been in venture now for about seven years, and so I can still say that I've spent most of my career in data. But it won't be long before that is no longer true. And certainly I have found that my practical, technical skills have gotten rustier. One comment on that is that I do think that losing my Python, SQL skills, etc. has actually enabled me to look at some of the tools and platforms that are available to users today, with a fresh set of eyes. I'm not as entrenched in the same patterns of behavior and workflows as I was when I was a practitioner. So it's been helpful to shed some of my biases. But I think what I've discovered is that you can understand how something works without using it. And therefore there are two things that are kind of critical to building technical understanding for me. One is just spending a lot of time with practitioners, and hearing about their experiences. How they're using various tools, how they're thinking about various sets of technologies. Frankly, just learning from them almost feels like a shortcut. Instead of trying to figure out what the difference is between automated prompting and prefix-tuning, just going to ask somebody and have a conversation with them. Which is kind of coincidental, and perhaps even ironic. Like, accelerate my learning by just learning from people with expertise in those areas. There's a lot that I just learned through conversation with practitioners. But I think going one level deeper β€” either reading white papers or reading research papers that give you kind of a high-level overview of an architecture, or how something works without getting into the nitty gritty of the underlying code or math β€” allows me to reason about these components at a practical level of abstraction. I can see how things fit together. I understand how they work. That doesn't necessarily mean that I'd be able to implement them. Definitely doesn't mean that I'd be able to iterate on them. But it's enough depth to reason about a component, and it's placed in a broader technical stack. Lukas: It's funny though, sometimes I feel like investors...I mean all investors do that to some extent, and I totally get why. But I think that I often feel also paranoid about losing my technical skills, because I feel like if all you can do is sort of figure out what box something belongs to, it's really hard for you to evaluate the things that don't fit into boxes. And I feel like almost all the interesting advances β€” actually, all the products that we want to come out with at Weights & Biases β€” generally is stuff where it doesn't fit neatly into one of those ML workflow diagrams that people make. Because if it was one of those boxes, then of course people are doing it, because it makes logical sense, but it's sort of when that stuff gets reshuffled...it does seem like you're able to maintain a much greater level of technical depth than the average investor, even in the data space. Which is why I wanted to have you on this podcast. I hope I'm not offending any of my current investors. Just a caveat there. You all are wonderful. I really do feel like you somehow maintained a much greater technical depth than most of your colleagues. Sarah: In many ways I'm amazed by my colleagues and what they do, because I think there are many investors that can reason about the growth of companies, and reason about sets of boxes and the relationships between those boxes without understanding what those boxes do. I don't think I could do that, but I've always also just been the type of person who needs to go a little bit deeper. As an example, I started my career in data science, but at Amplify I also invest in databases. And at some point β€” writing SQL queries, working with dataframes β€” I just wanted to better understand what was happening. When I write a SQL query and data shows up in my SQL workbench, what is happening on my computer? I think a lot of people take that stuff for granted. And they can. That is the beauty of abstractions. That is the beauty of technology. We are able to have this video conference β€” we are able to connect over the Internet β€” without understanding how the Internet works. My personality is such that I want to understand how the Internet works. I want to understand why I have service in some places and why I don't have service, and why my dataframe is slower than my SQL query. I do think that that makes me think about technical systems in different ways. Lukas: It’s funny, my co-founder Shawn is obsessed with β€” in technical interviews β€” assessing if someone understanding how a computer works, in his words. Which I think is really interesting, because I feel like I'm actually not... That's kind of a weakness of mine, I always wonder about a lot of the details there, but it is sort of an interesting perspective. I love working with all of my colleagues who have that same drive to understand how everything works. Okay, here's another question that I was wondering, I was thinking about. If I were to come to you, and I had a company in the data/ML space, and I had a bunch of customers that were really who we think of as tech-forward β€” like Airbnb, and Google, and that genre β€” would that be more impressive? Or would you be more thinking I'm likely to succeed if I came to you with a set of customers who we don't normally think of as tech-forward? Like an insurance company β€” a large insurance company β€” and a large pharma company. Which would you look at and say, ""Oh, that seems like that company is going to succeed""? Because part of me watches technology flow from the more tech-forward companies everywhere. But another part of me is like, ""Wow, these kind of less tech-forward companies have a whole set of different needs and often a different tech stack. And certainly there's more of them and they have more budget for this stuff."" So which would be the more impressive pitch for you? Sarah: Yeah, it's funny because I think in many ways the way that VCs make decisions β€” the way that we think about deals β€” is actually super similar to some of the patterns that we observe with neural networks. And that of course means that we have bias. It also means that we learn from patterns that we've observed. So, I can give you the honest answer, and then I can also give you the rational answer. The honest answer is that I would be more impressed by a company that has engaged with tech-forward customers. For the reasons that you described. In the past, we have generally seen that tech will spread from the Airbnbs and Ubers and FAANGs of the world into the enterprise, and not the other way around. We also have a bias that these more traditional enterprises tend to move slower. There tends to be a lot of bureaucratic red tape that you need to navigate. And as such, those markets tend to be less attractive. So, on its face, if you just said...you don't have any additional information about the velocity of sales, about the quality of the tech or team, etc. But like you're- Lukas: -holding them equal, I guess. Equivalent. Sarah: Yeah. That said, I think that is one of the biases that can cause us to make poor decisions. What really matters are some of the things that I just alluded to. If you're able to sell into insurance companies repeatedly β€” and with high velocity β€” that is arguably a better business than a company that spends 6 to 12 months trying to sell into tech companies. So it's less about ""To whom do you sell?"" and more about, ""Is that a big market? Are you able to sell efficiently? Are you able to sell scalably?"" I think sometimes we need to be aware of our biases and the impact that marquee logos can have on our decision-making. Lukas: Well, I can't tell if you think it's a rational bias or not. I mean, in some sense, you could call all pattern-matching biases. Do you really think it would be rational to sort of be less enamored with tech-forward customers than you actually are? Sarah: I think we need to ask ourselves and probe on, ""Under what circumstances might enterprises move quickly?"" A great example of this is a company called Afresh, which was one of the companies that did use RL to disrupt an industry. At that time that so many companies were trying to do the same thing, but didn't have as much insight into what was happening within an industry. They offer tech solutions β€” including things like inventory management and forecasting β€” to companies in the grocery space. Now, you might think that grocery is going to be a super outdated, slow-moving industry. And therefore that selling into grocery chains would be long and tedious. And perhaps not very scalable. But, at the time, a lot of grocery stores were responding to β€” and/or otherwise just terrified by β€” the acquisition of Whole Foods by Amazon. This was then [followed] by the pandemic, which certainly put a lot of stress on their online and multi channel-delivery and e-commerce capabilities. So there were these exogenous shocks which made what might have been slow-moving market participants move a lot faster. Those are the phenomena that we're sometimes blind to, because we just hear ""grocery"" or ""healthcare"" or ""manufacturing"" and think ""slow"", rather than thinking, ""What would it take for the participants in that sector to move fast?"" Lukas: That makes sense. Here's another point that you made on Twitter, that I was contemplating. I actually don't think I have a strong point of view on this, although I really should β€” given the company that I'm running β€” but you mentioned a lot of VCs have been saying that you expect the point solution MLOps space to consolidate. One thing that's interesting about that, is that I think you've invested in some MLOps tools. Do you sort of expect them to expand in scope and eat the other companies? Is that something that you need to bet on when you invest in them? Or would you be happy to see them get bought by other tools? How do you think about investment then, in MLOps tools companies, with that worldview? That's my practical question. And then the other thing that I observe, is that it doesn't necessarily seem like developer tools in general is consolidating. So I think I might even agree with you, but I wonder how you sort of pattern match that against developer tools. Or even maybe the data stack... I don't know. Do you think that the data stack is also consolidating? Or what's going on there? Sorry, I just dumped a whole bunch of different questions on you, but... Sarah: Those are great questions. So, I do think that in general most technical tools and platforms will go through phases of consolidation and decoupling. Or, as people love to say today, bundling and unbundling. I think it's just the nature of point solutions versus end-to-end platforms. You have a bunch of point solutions, they're difficult to maintain, they may be challenging to integrate. You then kind of bias towards end-to-end platforms, you adopt an end-to-end platform. It doesn't address a certain edge case or use case that you're experiencing, so you buy a new tool for that edge case, and unbundling happens. I think the pendulum will always swing back and forth between bundling and unbundling, for that reason. Or coupling and decoupling, for that reason. To be clear, as a former buyer, I don't think that point solutions or end-to-end platforms are the best solutions for a company. I think there's space in the middle, where you have a product that can solve a few adjacent problems. That's typically what I look for when I invest. I want to make sure that the company in which I'm investing is solving an urgent β€” and often point β€” problem. They're solving an urgent and specific problem. However, I typically also want to see that the founder has a hypothesis about how they would expand into adjacent problem areas. It's not that I think solving point problems is bad, but I do think given the pendulum of coupling and decoupling, having some hypotheses about the areas that you can expand into becomes critical. It's interesting to consider why this may or may not happen in the world of developer tools. I'd argue that you still see consolidation. However, the consolidation tends to happen across layers of the stack, versus across the workflow. Lukas: Interesting. What are you...tell me...what are you thinking of there? Sarah: Things like serverless, where you're no longer reasoning about resources and config. That might not be impacting other parts of your developer workflow. That might not be eating into your git-based development workflows, or your testing processes, and things like that. But it is eating into how you think about managing VMs or containers. It is possibly eating into how you think about working with cloud vendors, and deciding upon underlying hardware, and things like that. So it might be the case, that it's like in software development, we've seen companies β€” or we've seen vendors β€” solve specific problems, but solve those all the way down the stack. I haven't really thought about that as deeply. But I think it's a worthwhile question to ask. I would say that one of the big differences, though, that I see β€” and that we of course need to be mindful of β€” is that there are far more developers than there are data practitioners. And so, when you're trying to answer the question, ""How does this thing get big?"", those building developer tools can arguably solve a specific problem for a larger number of people versus data teams when you're trying to answer this question of, ""How does this get big?"", you could potentially get stumped just by the number of people for whom a tool is actually applicable. Lukas: Is that what gives the intuition that we're in a moment of bundling? That there's just all these point solutions that you feel kind of can't survive on their own, just given the size of the market that they're in? Sarah: I think it's a combination of things. On one hand, I see a lot of...the slivers are getting tinier. You start to see things like ""model deployment solutions for computer vision,"" and perhaps some subset of computer vision architectures. Where, you might think to yourself, ""Okay, I understand why the existing tools are maybe not optimal for that specific use case, but that's really narrow."" To my point about thinking about these orthogonal problems, it's unclear how you go from that to something meatier. That's one phenomena that I observed. I think the other is just that practitioners are really, really struggling to stitch things together. The way a friend put it to me about a year ago, he basically said he feels like vendors are handing him a steering wheel, and an engine, and a dashboard, and a chassis, and saying ""Build a fast, safe car."" Those components might not even fit together, and there's no instruction manual. It's easy to cast shade on the startups that are building these tools and platforms, but I think one of the things that is more challenging in the ML and AI space than even like data and analytics, is that a lot of the ML engineering and ML development workflows are really heterogeneous now. If you're a vendor and you're trying to think about, ""With whom should I partner? With whom should I integrate? Do I spend time on supporting this integration?"", it's tougher to make those decisions when practices and workflows are so fragmented and heterogeneous. I do think that creating more of a cohesive ecosystem has been difficult not because vendors are dumb, but because there's just a lot going on. Lukas: Well, I think the other challenge maybe is that when there's so many different technologies that people want to integrate into what they're doing β€” because there's so much exciting research and things that come along, based on different frameworks and so on β€” it's hard to imagine an end-to-end system that would actually be able to absorb every possible model architecture immediately, as fast as companies want to actually use it. Sarah: Yeah, yeah 100%. I have been thinking about this in the context of LLMs. We don't yet know how the consumers or users of pre-trained models are going to interact with those who create the pre-trained models. Will they be doing their own fine-tuning? Will they be doing their own prompt engineering? Will they just be interacting with the LLM via API? Without insight into those interaction models, it's really hard to think about building the right set of tools. It's also unclear to me that the adoption of LLMs would actually imply that we need a new set of tools, both for model development and deployment, and management in production. I have a lot of empathy for people who are building ML tools and platforms because it's a constantly moving target. Yet, there's the expectation that you're able to support heterogeneity in all regards. In all regards, whether it's the model architecture, or the data type, or the hardware backend, or the team structure, or the user skill sets. There's so much that is different from org to org. I think building great tools is really challenging right now. Lukas: I guess that's a good segue to a question I was going to ask you. When you look at LLMs, do you have an intuition on if a new set of tools are needed to make these functional? Sarah: I think one of the bigger questions that I have is, again, on how the consumers of LLMs β€” or how the users of LLMs β€” will actually interact with those LLMs. And more specifically, who will own fine-tuning. I imagine that there are certain challenges that will need to be addressed, both with regards to how we collaborate on the development of the LLMs, but also how we think about the impact of iterations on LLMs. If OpenAI wants to retrain one of their models β€” or otherwise tweak the architecture β€” how do they evaluate the impact of that change on all of the people who are interfacing with the GPT-3 API, or with any of their other products? I think a lot of the tools that were built for model development and deployment today kind of assumed that the people who were developing models would be the same set of people β€” or at least within the same corporate umbrella β€” as those who are deploying and managing models in production. And if LLMs drive a shift β€” wherein those who are developing models and those who are deploying and building applications around models are two completely separate parties β€” then some of the tools that we have today might be ill-suited for that context. Lukas: Do you think we're headed towards a world like that, where there's a small number of companies generating foundational models? And then mostly what other companies are doing is fine-tuning them or doing some kind of prompt engineering to get good results out of them? Sarah: Here we're getting a little bit into the technical nitty gritty, but my impression from tracking the research community so far has been not all...though LLMs are great for what we typically think of as unstructured data β€” primarily images, text, video, et cetera, audio too β€” they have not outperformed gradient boosting or more traditional methods on structured data sets, including tabular and time series data. Although there's some work on time series that I think is pretty compelling. This is one of those areas where I feel like the research community just completely underestimates how many businesses operate on structured data. While it's possible that adoption of LLMs will drive this new interaction model or new market model β€” wherein some companies built these large foundation models and others interact with those β€” I don't see gradient boosting or more classical approaches going anywhere. Because I don't see structured data going anywhere. Arguably, structured data powers many of the most critical use cases within organizations, ranging from search and recommendation engines to fraud detection. I think it would be a tragedy to neglect the needs of those who are using...I don't want to say simpler approaches, but certainly simpler approaches and more complex approaches, by using architectures that are not perhaps attention-based, when working with these specific data sets. Lukas: Interesting. Do you have an opinion on...how to say this? I feel like many investors especially, but I think many smart people looking at the space of ML and data, they think, ""Wow, this is gonna commoditize. This is going to get...tools are gonna make this easier. Less companies are going to want to do this internally and spend money on expensive resources."" But I guess when I look at what companies actually do, it seems like they spend more and more, and even kind of push up the salaries. And they have this fight for scarce, specific talent. Which way do you sort of predict things are going? Do you think like 10 years down the road, ML salaries go up or do they go down? Maybe it's a more concrete way of putting it. Sarah: Yeah, that's a great question. I probably expect that the variance would increase. My guess is that there are certain applications that may be commoditized β€” or at least that may be commoditized for some subset of the market β€” while others continue to be pursued in-house. Search is perhaps a very interesting example. For some businesses, they may be more than happy to rely upon a vendor to provide those semantic or vector-based search capabilities. While search may have an impact on their bottom line, perhaps it's not the most critical or most impactful thing to their business, but rather just a capability that they have. This is not to say that Slack actually uses a vendor or should use a vendor, but as far as I can tell, Slack doesn't really monetize on search. You'd contrast that, however, with an e-commerce business or something like Google, where their ability to deliver the highest quality search results and their ability to improve search β€” just marginally β€” could be a huge impact on revenue. Those companies are probably likely to develop their own models. I think we'll see that some companies do their own model development. Some use cases are not commoditized, and those companies for those use cases you see very high ML salaries. But then, perhaps for others, you're really just a software engineer who knows a little bit about ML, and can interface with some of these models through APIs, and can reason about the output of experiments and behavior that you might see in production. Lukas: I guess in that vein β€” and you sort of alluded to this earlier a little bit β€” what do you think about all these sort of low-code and no-code interfaces into exploring data, building ML models? You mentioned earlier that you think that's generally a really exciting trend. Sarah: My opinions on this category are pretty nuanced, so I was thinking about where to start. Generally speaking, I'm very skeptical of no-code, low-code solutions. I find that many of these tools β€” no matter what the sector or what the use case β€” they end up shifting the burden of work. Not necessarily removing that burden, or even lightening that burden. A great example is self-service analytics. My own belief is that in general, most self-service analytics tools don't actually reduce the burden that the data team or analytics team bears, but rather shifts the work of the data team from building analytics products to debugging, explaining, or fixing analytics products. And I think the same can be true in the ML space. Why I'm excited about some of these tools in the ML space is that I actually think that in ML, failing fast is really critical. Some of these tools that enable users to prototype ML-driven solutions might help them better understand, ""Is this going to work? What additional investments do I need? What do my users expect from the system before they make a decision to invest further?"" It enables that kind of quick prototyping, learning, and failing fast. The other thing that I feel quite strongly about, is that we need to explore ways to decouple model development and ML-driven app development. Whenever I talk to companies about their ML architectures or their ML stack, it becomes so obvious that ML is just this one tiny component in a much larger app architecture. The prediction service might be connecting with other databases, or stream processing systems, or other microservices, tools for authorization, and so on and so forth. I think it's really important to be able to build applications around a prediction service while independently iterating on the model that powers that prediction service. So, I am somewhat long on tools that enable engineers to prototype ML-driven systems, so that they can build those application architectures. And then, once they have a better understanding of the full system requirements β€” including some of the latency associated with things like moving data around β€” they can kind of pass off a fuller spec to a data scientist who will iterate on the model and model architecture, armed with the knowledge that these are the attributes that we need in order to make this project successful. Lukas: That makes sense. Okay, another question. When you invest in a company that is providing some kind of ML or data service, does it cross your mind, ""What if AWS does that?"" Or GCP or Azure. Is that an important thing to consider, do you think, or is that irrelevant? Sarah: Yeah, yeah. I smile because I feel like this question, it comes up somewhere between like one to five times a week. Given the areas that Amplify invests in β€” we're primarily focused on data, ML tools and platforms, enterprise infrastructure, and developer tools β€” we're constantly fielding this question of, ""What if AWS or GCP or Azure does this? Won't that company β€” won't that market, et cetera β€” get crushed?"" In the past, what I've told people is that I have found that startups tend to be better at building developer experiences. Anecdotally, this is just something that we observe. People complain a lot about the experience of using AWS tools, the experience of using things like SageMaker. I've thought a little bit more about why that's the case. I think, generally speaking, the cloud vendors need to develop for their most spendy customers, their highest-paying customers. And their highest-paying customers tend to be enterprises, shockingly. As such, they're developing for an enterprise user who probably has fairly strict privacy/security requirements, who may have a very distinct way of organizing their teams, who may be bringing in a persona with a specific skill set into data science or ML roles. If I had to present a hypothesis about why they haven't been able to compete on developer experiences, I think it's because often they are creating tools and platforms for a developer who is not as representative of the rest of the market. But, to be honest, with the passage of time, I've just seen enough examples of companies that have been able to out-compete the cloud vendors where I just don't worry about it that much anymore. Lukas: Have you ever seen anyone get crushed? Sarah: Crushed? Lukas: Has that happened in your career? Sarah: No. I mean, I'm sure it has. But it's hard for me to think of an example, whereas it's easy to think of many, many examples of companies that were not crushed by the cloud vendors. If anything, I think sometimes we see that start-ups get...they sell too soon. The way in which the cloud vendors out-compete them is putting some juicy acquisition offer in front of them and then they don't have to compete. That's the only example that I could see or think of, off the top of my head, of the cloud vendors crushing a potential competitor. They crush it with their dollars. Suffocate companies with their acquisition offers. Lukas: R&D through M&A, yeah. I saw an interview or a conversation that you had with Andrew Ng. I thought you had an interesting point that academic benchmarks...they often don't really reflect industry use cases. But you were kind of pointing out that industry has some share of the blame for this. Can you say more on that topic? Sarah: Oh, absolutely. I am really grateful to Andrew for actually drawing my attention to this issue. We often think about the gap between research and industry, but we don't as often think about the gap between industry and research. Andrew and I had been talking about this challenge of structured data versus unstructured data. I think I said to him, ""What I see in industry is that most ML teams are working with tabular and time series data. What I see in the research community is that most researchers are building new model architectures for unstructured data."" There's a big mismatch between what model architectures people in industry need β€” given the data that is available to them, as well as given the types of problems that they're trying to solve β€” and the research that's becoming available. Now he pointed out to me β€” and this is something that I hadn't really thought about before β€” researchers have access to unstructured data. They have access to things like ImageNet. They don't have access to high volumes of data on user sessions, or logs, metrics, and events. The data sets that tend to be the lifeblood of most companies. It is very difficult to innovate on AI techniques for data sets to which you have zero access. I think it's easy to point to that research and be like, ""Oh, there's such a big gap between what they're building and what we need."" I think we also need to be mindful of what the research community can do, given the resources that they have available to them. I've seen a couple of efforts by a few organizations to open source their data sets, but it's tough because oftentimes the most valuable data sets are the most sensitive ones. What company wants to share their click-through data that probably reveals the state of their business, some of the experiments that they're running, and so on so forth. Lukas: Well, there's also not a lot of upside. I remember the Netflix contest was such a popular, awesome thing. Got so many people involved, so much attention to research to Netflix β€” still a seminal data set β€” but they didn't do a second one because they felt like...there are user privacy issues, that they couldn't get around to release it. I don't know if you remember when AOL released a subset of their query logs. It was so exciting to actually have that. I was in research at the time and I was like, ""This data set is like gold."" And then like the next day, they fired the person that released it. And their boss β€” I think their boss' boss, right? β€” because there was some personal identifying information in that. It's hard to see a lot of upside for corporations, even if they were sort of neutral on the impact of...on the company secrets, IP issue. Sarah: Yeah. One of the things that I have seen β€” that has been very encouraging β€” is more and more interview studies or meta analyses coming out of the research community. Where it's clear that the researchers are interested in better understanding the problems that practitioners face in industry. One critique that I've had of those studies in the past, is that the authors tend to interview people to whom they have immediate access, which means that they often interview practitioners at some of their funding organizations. The organizations that are sponsoring their labs, which means that they tend to bias more towards larger enterprises or big FAANG companies. They're interviewing people at Facebook, Apple, Tesla on their data and ML tools, platforms, practices, and then drawing conclusions about all of industry. But I think that recently I've seen a couple of studies come out where there's been a more focused effort to get a more random β€” or at least more diverse β€” sample of practitioners from both smaller startups, more traditional companies, bigger tech companies, et cetera, to really better understand both the similarities and differences between how they approach model development and deployment. I hope that continues. Lukas: Do you have a study that's top of mind, that you could point us to? Sarah: So, Shreya Shankar, who had actually been a university associate. Lukas: Yeah, I saw that. Totally. Nice. Sarah: I was really thrilled because Shreya actually reached out to us and said, ""Hey, can you connect us to people at different types of companies? I've got connections to people at Instagram, Facebook, Apple, et cetera et cetera, but I want to talk to people at mid-market companies, or early-stage startups, and B2B companies, and better understand some of the nuances of their workflows."" Lukas: What was the name of the paper? I think I just saw it. Sarah: ""Operationalizing Machine Learning: An Interview Study"". Lukas: Thank you. Yeah, I agree. That was an excellent paper. Sarah: Yeah, yeah. The other thing that I had said...I sent Shreya a text message after reading through it. The other thing that I really appreciated about the interview study was that she didn't cherry pick the insights that were most likely to drive interesting research questions or solutions. I think she took a really genuine and unbiased approach to thinking about, ""What are the problems that people are talking about? What are the ways in which they're there solving them? Let's highlight that there are a bunch of problems that people are just solving in practical β€” albeit hacky β€” ways, but ways that they're content with."" I thought it was a very honest study. Lukas: Totally. I totally agree. Well, I guess if we are possibly headed towards another bubble in machine learning β€” or machine intelligence, as you sometimes call it β€” do you have any advice for a startup founder like me? Or maybe an ML practitioner, which is most of our audience. Having gone through another bubble, how would you think about it? What would you do if you started to...I think we're already seeing bubble-esque behavior. What are the lessons? Sarah: I think the most critical lesson that I saw/learned the last time around was, ""Focus on your users,"" or ""Focus on the strategic problems that you're trying to solve."" And ""Really, really understand if and why ML is the best tool to solve that problem."" I think it's critical to think about machine learning as a very important tool in our toolkit. But one of several tools. I was catching up with a friend a couple of weeks ago, and she had mentioned to me that the way in which she prioritizes ML projects is through regular conversations with their product leadership, and engineering leadership β€” and her representing ML leadership β€” about the product roadmap, about the user behaviors that they're trying to unlock. And then thinking about whether ML or traditional software development approaches are a better tool for achieving those things. I think as long as we continue to think about ML as a tool to solve problems β€” and as long as we have the tools that enable us to better understand if ML is solving those problems, and how to improve upon its ability to solve those problems β€” then ML can be a super powerful tool. And one that we learn to wield in more powerful ways too. But β€” I feel almost like a broken record saying this, given the lessons learned in the past β€” if we treat ML like a silver bullet, if we treat it like a hammer looking for a nail...that was the pattern that I think led to failure. Don't think about ""What ML can do for you"", think about ""What you can do for your country,"" and if ML is the right way to do that, I guess. That's the lesson that we learned and I hope it's the lesson that we will carry forth. Lukas: Love it. We always end with two open-ended questions. The first of the two is, if you had extra time, what's something that you'd like to spend more time researching? Or, put another way, what's an underrated topic in data or machine learning? Sarah: Oh man, that one is very easy for me: programming languages. I would love to spend more time learning about programming languages. I am definitely not convinced that Python is the right interface for data science, or that SQL is the right interface for analytics work. I would really love to learn more about programming language design, so that I could better diagnose if and why Python and SQL are the wrong tools, and how one might go about building a better PL interface for data scientists, ML engineers, and analysts. Lukas: Okay, a question that I didn't ask β€” because I thought it was a little weird or maybe nosy β€” is why you're asking on Twitter if anyone knew any female Rust developers. Because I will say Rust comes up just a shocking amount on this podcast, and I was wondering what's driving the interest in Rust, and then if there was some reason behind looking for a female Rust developer, and if you actually found one. Sarah: Yeah, yeah. So, full transparency β€” and I think I maybe put some of this on on Twitter too β€” quick background is that certainly earlier in my career, I felt like oftentimes I wasn't getting invited to the same set of events, et cetera, as some of my male peers, and therefore I wasn't getting exposure to the same set of conversations β€” maybe even the same opportunities β€” to potentially see deals, and things like that. I feel pretty strongly that we need to have women in the room when we host events, to ensure that they're getting exposed to the same set of opportunities. That we're not doing things to hamper their progress in the industries in which they they operate. We were hosting a Rust developer dinner, and looked at the guest list, and there weren't that many women, and it felt like we could do better. Thus the origins of my question. Lukas: I see. Sarah: Why Rust? See, I wish I spent more time studying programming languages, so I could better understand why people are shifting from C++ to Rust. Luca Palmieri β€” who I believe is now at AWS, actually β€” has a great blog post on why Rust might be a more appropriate backend for Python libraries that often have C++ backends. Things like pandas, where we experience it as Python but in fact it has a C++ backend. I've heard that Rust is more accessible than C++ and therefore could perhaps invite more data practitioners to actually contribute to some of those those projects. But I don't know enough to really say why Rust is so magical, other than a lot of smart people β€” apparently, like Linus Torvald too β€” believe it is. If it's good enough for him, it's good enough for us. I don't know. Lukas: Fair enough. My final question for you is, when you look at the ML workflow today going from research into deployment into production, where do you see the biggest bottlenecks? Or maybe where do you see the most surprising bottlenecks for your portfolio companies? Sarah: I generally think that...there are two bottlenecks that I would call attention to. Actually three, sorry, I'm being kind of indecisive here. One pattern that I've observed with ML is that we often iterate on ML-driven applications β€” or ML-driven features β€” more frequently than we iterate on more traditional software features. To give an example, we may iterate on a pricing algorithm far more frequently than we would iterate on a navigation panel, or an onboarding flow, or something like that. Earlier I was talking about understanding how ML can solve user and company problems. I don't really think we have enough insight into the way in which model performance correlates with behavioral data β€” or the product engagement β€” to iterate super effectively on models. I think that has been a limitation, and one that could have nefarious effects in the future. Another big challenge that I see β€” and I alluded to this before β€” is the challenge of building software applications around a prediction service, or around a model. In the past, people might have talked about this as a model deployment problem. The problem isn't containerizing your model and implementing a prediction service in production. I think that has gotten significantly easier. The problem is connecting to five different databases, each which have different sets of ACID guarantees, latency profiles...also connecting to a UI service, potentially connecting to other application services. The problem is the software development. What you've got is a trained model, but now you actually have to build a software application. I don't think we have great tools to facilitate that process, either for ML engineers or for software engineers. And then around the same space, I also think that the transition from research to production β€” and back β€” can still be challenging. Perhaps what a company wants to do β€” upon seeing an issue associated with the model in production β€” is actually see the experiment runs associated with that model, so that they might get more insight into what is now happening in that production environment. That shouldn't be difficult to do. But, in the past I think we really developed tools either for model development or for MLOps, and we're starting to see some of the pain points that arise when those sets of tools are not coupled together. Lukas: Cool. Yeah, that all definitely resonates with me. Sarah: Lest I sound too cynical, I am really optimistic about the future of ML. I think we just need to do it in a sane and rational way and be mindful of what we're trying to accomplish here, instead of just focusing on flashy press releases and cool demos. Lukas: I was thinking as you were talking about the hype cycle, and large language models, and stuff. I was thinking VCs probably feel the hype cycle the fastest. I'm like, ""Man, we've basically solved the Turing test and, like, no one cares. My parents are like, ""What even is this,"" you know. It's like, ""Come on, this is awesome, look at it."" But I think every investor knows about Stable Diffusion but I don't think...I even come across Chief Data Officers at Fortune 500 companies who are like, ""What's Stable Diffusion?"" It's like, ""Come on, you should know about this."" Anyway... Sarah: Yeah, yeah. But I think there's this awareness, though, of ""This is where the hard work starts."" Lukas: Yeah, totally. Sarah: ""Great, we're able to generate beautiful artistic renderings based on textual prompts. Okay, how do we generate photos that are equivalent to that which a professional photographer would produce?"" Because that's what it's going to take to get a Getty Images or Flickr to adopt something like Stable Diffusion. How do we make automated rotoscoping so good that a video editor doesn't need to correct the mask at all? Because that's what it's going to take for Runway to compete with some of the more traditional video editors. I saw, through Runway, that the research is not good enough. They've had to do a lot of engineering, as well as their own research, in order to operationalize some of these things. I am so optimistic about the potential of the technologies, but I also am realistic that reining them in, and actually leveraging these technologies to do good in the world β€” or to build great products β€” is hard. Short anecdote, but I've been talking to a founder who was working on brain-computer interfaces and actually developed this technology where, effectively, it's able to read minds. You had to put on some big helmet thing, but once the helmet was on, it could kind of transcribe thoughts. And they were able to get it to work. Now, the founder subsequently shifted focus to the gaming space, doing more work with haptic interfaces. I was asking him like, ""Why didn't you pursue the mind reading tech further?"" And he said to me, ""We couldn't find any great use cases."" Isn't that crazy? But I think, this is tech. Sometimes you can do absolutely remarkable things with technology. But it doesn't matter. It doesn't matter unless you figure out how to appeal to people, and get them to use it, and how to align that technology with an important set of problems. I think that is the thing β€” as VCs β€” we need to continue to remind ourselves. Tech is not easy. Tech is not easy, but people are not easy either. Both are really hard. Unlocking new sets of technologies often means that we are granted the opportunity to solve really hard human problems. I guess...TL;DR if GPT-3 starts reading minds. Maybe we'll be able to find some applications for it. But, we'll see. Lukas: Thanks so much, Sarah. That was super fun. Sarah: Yeah, for sure. Bye! Lukas: If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So, check it out.",9519,"In a podcast, Sarah Catanzaro, a General Partner at Amplify Partners, discusses the hype surrounding AI and ML and the importance of understanding the necessary infrastructure and data investments. She also emphasizes the need for a better understanding of LLMs and their capabilities and warns against the dangerous misunderstanding of AI as something magical. The podcast also discusses biases in venture capital decision-making and the challenges of creating a cohesive ecosystem in the ML and AI space.
3
+
4
+ Sarah also discusses the importance of failing fast and decoupling model development and ML-driven app development. She addresses concerns about cloud vendors potentially crushing startups in the ML and data service space and the gap between industry and research in ML benchmarks. The podcast also touches on the challenge of structured data versus unstructured data in machine learning and the potential benefits of using Rust as a backend for Python libraries with C++ backends. Finally, the podcast emphasizes the importance of finding practical applications for new technologies in order to solve real human problems."
5
+ 1,CristΓ³bal Valenzuela β€” The Next Generation of Content Creation and AI,https://www.youtube.com/watch?v=wbonGgk-_Gk,2426,2023-01-19,"Cris: I think a big mistake of research β€” specifically in the area of computer creativity β€” is this idea that you're going to automate it entirely. You see one-click off solutions to do X, Y, or Z. I think that's missing the bigger picture of how most creative workflows actually work. That probably means that you've never actually worked with an agency where the client was asking you to change things every single hour, or make it bigger, make it smaller, right? Lukas: You're listening to Gradient Dissent, a show about machine learning in the real world. I'm your host, Lukas Biewald. Cris Valenzuela is an artist, and technologist, and entrepreneur, and CEO and founder of a company called Runway, which is a maker of ML-powered video editing software. But I feel that description doesn't even do justice to how incredible and innovative his product is. This interview actually starts off with a live demo of his product. I really recommend switching to video if you're listening to this on audio only, because his demo is absolutely incredible. Well, all right, Cris, we don't normally do this, but I thought it would be fun to start with a product demo if you're down for it. You have such a cool, compelling product. Would you be up for that? Cris: Sure. What do you want me to demo? There's a lot I can do. I want to make sure I can focus on what you want to see. Lukas: Well, this is an ML podcast. So I think people would probably be interested in the most flashy ML features. How about that? Cris: In short, Runway is a full video creation suite. It allows you to do things that you might be able to do in more traditional video editing software. The main difference is that everything that runs behind the scenes...so, most of the core components of Runway are ML-driven. The reason for that, it has two main kind of modes or uniqueness about making everything ML-based. One is, it helps editors, and content creators, and video makers automate and simplify really time-consuming and expensive processes when making video or content. There are a lot of stuff that you're doing in traditional software that are very repetitive in nature, that are very time-consuming or expensive. Runway aims basically to simplify and reduce the time of doing this stuff. If you have a video you want to edit, an idea you want to execute, spending the time, and the minutes, and the hours, and sometimes days on this very boring stuff is not the thing that you really want to do. So we build algorithms and systems that help you just do that in a very easy way. And then there's another aspect of Runway, that it's not only about automation, but it's about generation. We build models, and algorithms, and systems that allow our users and customers to create content on demand. > And everything...baseline for us is that everything happens on the browser. It's web-based and cloud native, which means that you don't rely any more on native computers, or native applications, or desktop compute. You have access to our GPU cluster on-demand, and you can render videos on 4k, 6k pretty much in real time. Plus you can do all of this AI stuff also in real time as well. A lot of the folks are using Runway now β€” CBS, The Late Night Show with Colbert, or the folks who edit Top Gear, or sometimes creators who do stuff for Alicia Keys or for just TikTok or movies β€” they're all leveraging these AI-things via this web-based cloud based editor. So that's a short, five-minute intro, what the product does and how ML or AI plays a role in the product itself. But I'm happy to now show you how everything goes together and the experience of using the editor, if that makes sense. Lukas: Please, yeah. Cris: Cool. Any questions before we do that? I can double down, or if you want to me to clarify? Lukas: Well, I actually didn't realize that professional video teams like The Colbert Show use Runway. Do they use it for all of their video processing or is there a certain part where they they use it? How does that work? Cris: It depends. Some editors and some folks are using it as an end-to-end tool to create videos. Some other folks use a combination of different softwares to make something. The folks who we use it for movies sometimes add in Nuke or Flame. We have a big Flame community, so Runway becomes a part of that workflow. It's replacing either something you do on a very manual basis. It's sometimes replacing a contractor you hired to make that work for you, or it's sometimes replacing your own work of trying to do it yourself in this old software. But you still use other aspects of it, or other software to combine [with] it. It really depends on the type of content you have and the level of outcomes that you that you need. But we do have folks that use it as an end-to-end content creation and editing tool. Lukas: Cool. Well, I mean the extent of my video editing is basically modifying videos of my daughter to take out the boring parts and send them to my parents. That's as far as I go. Maybe you could sort of give me a little bit of an overview of the cool stuff you can do with Runway. Cris: Totally. You can do all of that in Runway on the browser which is...you might be...you might start using Runway for that. The one thing I would emphasize is, everything is running on the cloud, on the web. You can just open any project with a URL. You can also create teams, and you have this baseline collaboration aspect that just runs out-of-the-box. Cool. Anything else? No, just go demo? Lukas: Yeah, let's see a demo. Totally, yeah. Show me the cool stuff. Cris: Perfect. So, this is what Runway looks like. If you're ever edited video before, it's a very common interface. We have tracks on the bottom. We have a multi-editing system with audio tracks, and keyframe animations, and text layers, and image support. You can preview your assets on the main window and have a bunch of effects and filters on the right. Again, everything running pretty much on the cloud in real time. The idea here is that there are a lot of things that you can do that are very similar to stuff that you can do in other applications, plus there are things that you can't do anywhere else. Let me give you an example of something that a lot of folks are using Runway for. I'm going to start with a fresh composition here. I'm going to click one of the demo assets that I have here. I'm going to click this. I have a surfer, right? On that shot, let's say I want to apply some sort of effect or transformation to the background of this shot. Or I want to maybe replace the person here and take it somewhere else. The way we do that today would be a combination of frame-by-frame editing, where you're basically segmenting and creating an outline of your subject, and every single frame you move you have to do it one more time. For that, we built our video object segmentation model β€” which we actually published a blog post and a paper around it β€” that allows you to do real-time video segmentation. In film, this is actually called rotoscoping. You can just literally go here, guide the model with some sort of input reference. I tell the model this is what I want to rotoscope, and it can go as deep as I need. I can select the whole surf layer here at deeper...more control over it. Once the model has a good understanding of what you want to do, it would propagate that single keyframe or single layer to all the frames of video in real time. You get a pretty smooth, consistent segmentation mask that you can either export as a single layer, or export as a PNG layer, or you can use...go back to your editing timeline and start modifying. You said you want to cut it, you want to compose it, you want to do some sort of transformation...from here, you can do that directly from here. Let's say I have my baseline β€” or my base video β€” here, I have my mask on top of that, and now I can just literally move it around like this. I have two layers, right, with a surfer. So, something that looks very simple and in traditional software may take you a couple of hours of work, here you can do pretty much in real time. Again, it's something that most editors know how to do, but it just takes them a lot of time to actually do. Lukas: And did you just run that in the browser? Cris: Yeah. Lukas: That segmentation mask, it figured out in the browser and it's calculating all...it doesn't go to the server? Cris: No, it goes to the server. Yeah, there's an inference pipeline that we built that processes real-time videos and allows you to do those things. The compute part is everything running on the cloud. You just see the previews and sometimes β€” depending on your connection β€” you can see a downsampled version of it, so it runs really smoothly and plays really nicely. Also, for every single video there's a few layers that we run, that help either guide something like a segmentation mask. For instance, we get depth maps and we estimate depth maps for every single video layer. You can also export these depth maps as independent layers and use them for specific workflows. That's also something very useful for folks to leverage. So you have this and you can export this. Behind the scenes, we're using this for a bunch of things. Lukas: Cool. Cris: Those are one of the things that you can do. You can go very complex on stuff. Let's say, instead of the surfer, I just want the β€” let me refresh this β€” I just want the background. I don't want the surfer. I can inpaint or remove that surfer from the shot. So I'm just gonna paint over it. Again, I'm giving model one single keyframe layer, and the model is able to propagate those consistently for the entirety of the video. That's also something we β€” as a product philosophy β€” really want to think about. Which is, you need to have some layer of control of input. The hard part of that should just be handled by the model itself, but there's always some level of human-in-the-loop process, where you're guiding the model. You're telling it, ""Hey, this is what I want to remove. Just go ahead and do the hard work of actually doing that for the whole video sequence."" Lukas: Wow, that's really amazing. That's like magic, right there. The surfer’s really just gone. Cris: Yeah. That's something we see a lot, when people find out about it, or when they start using it. ""Magic"" is a word we hear a lot. It's something that...again, if you're editing or you've worked in film or content before, you know how hard, and time-consuming, just painful it is. Just seeing it work so instantaneously really triggers that idea of magic in everyone's minds. Which is something for...that's great, because we've really thought of the product as something very magical to use. So, there's stuff like that. There are a few things like green screen and inpainting β€” which I'm showing you now β€” plus motion tracking, that we consider as baseline models in a Runway. Those are just...you can use them as unique tools, as I'm showing you right now. You can also combine them to create all sorts of interesting workflows and dynamics. There's the idea of, ""You want to transform or generate this video, and take this surfer into another location,"" you can actually generate the background, and have the camera track the position of the object in real time, and then apply the background that you just generated in a consistent manner, so everything looks really smooth. The way you do that is by combining all of these models in real time, behind the scenes. You might have seen some of those demos on Twitter, which we've been announcing and releasing. This is a demo of running a few of those underlying models, combined. There's a segmentation model that's rotoscoping the tennis player in real time. There's a motion-tracking model that's tracking the camera movement, and then there's an image-generation model behind the scenes that is generating the image in real time. Those are all composed at the same time. Does that make sense? Lukas: Yeah, yeah. Totally. Cris: Those are, I would say, underlying baseline models and then you can combine them in all sorts of interesting and different ways. Lukas: Totally. Alright, well, thanks for the demo. That was so cool. We'll switch to the interview format. Although now I really want to modify this video in all kinds of crazy ways. Cris: We should replace the background with some stuff while we're talking Lukas: Totally. Get this microphone out. One question I really wanted to ask you is, I think your background is actually not in machine learning originally, right? I always think it's really interesting how people enter the machine learning space. I'd just love to hear your story, a little bit, of how you ended up running this super cool machine learning company. It seems you're very technically deep, also. And so how you managed to get that depth mid-career. Cris: Totally. Long story short, I'm originally from Chile. I studied econ in Chile and I was working on something completely unrelated. But it was 2016 or 2017, I think, and I just randomly fell into a rabbit hole of ML- and AI-generated art. It was very early days of Deep Dream and ConvNets and AlexNet, and people were trying to make sense of how to use this new stuff in the context of art making. There were some people like Mike Tyka, and Mario Klingemann, and Gene Kogan who were posting these very mind-blowing demos. That now feel things that you can run on your iPhone on real time. But around that time it was someone...I remember Kyle McDonald β€” which is an artist β€” who was walking around with his laptop, just showing people a livestream of a camera. You had basically...I think with an ImageNet model running in real time, and just describing what it saw. And it just blew my mind. Again, it's 2016. Now it's pretty obvious, but around that time it was pretty special. I just went into a rabbit hole of that for too long. It was too much, I was just fascinated by it. I actually decided to quit my job, I decided to leave everything I had. I got a scholarship to study at NYU and just spent two years just really going very deep into this. Specifically in the context of, I would say, creativity. My area of interest was the idea of computational creativity. How do you use technology? How do you use deep learning or ML for really creative tool-making and art-making? That two-year-long research process and exploration ended up with Runway. Runway was my thesis at school. It was a very different version of what you see now. But the main idea was very much pretty much the same. It's like, ""Hey, ML and AI are basically a new compute platform. They offer new ways of either manipulating or creating content. And so there needs to be some sort of new tool-making suite that leverages all of this, and allows people to tap into those kinds of systems in a very accessible and easy way."" The first version of Runway was a layer of abstraction on top of Docker, where you could run different algorithms and different models in real time on this Electron app. You could click and run models in real time and connect those models via either sockets, or UDP, or a web server to Unity or Photoshop. We started building all these plugins where you can do the stuff that you are able to see now on Twitter. Like, ""Here, I built a Photoshop or Figma plugin that does image generation."" We were building all that stuff running Docker models in your computer locally, and you can stream those. It was 2018, 2019. Lukas: Interesting. It must have been a much more technical audience at the time then, right? If you have to run Docker on your local machine. That's not something everyone can do, right? Cris: Totally, totally. I think that that also tells a lot about how much progress the field has made, and how mainstream and how more accessible things have become. Trying to put this set of new platforms and compute ideas for creators, and video makers, and filmmakers required you to know how to install CUDA and manage cuDNN. I don't know if it's just too much. But people were still wanting to do it. There were some folks who were like, ""Hey, this is really unique. I want to understand how to use this."" But then we realized it wasn't enough. You need to go [to] higher layers of abstraction on top of that to really enable creative folks to play with this, without having to spend months trying to set up their GPU machines. Runway has really evolved, and we have a really experiment-driven thesis and way of working on the product. But it's all about trying ideas and testing them out with people really fast. We're building something that hasn't been done before. And so it's really easy to get sidetracked into things that you think are going to work, or ideas that you think are going to be impactful. But since you're working with new stuff all the time, being close to your user base for us has been kind of really, really important. Every time we iterate on the product, I think one consistent line of evolution has been this idea of simplifying...making higher abstraction layers on top of it. The first versions of rotoscoping or inpainting required you to select the underlying model architecture, and understanding what a mask was, and [how] propagation works. If you're really a filmmaker, you don't care about any of the stuff. You just want to kick once, and you want to get a really good result. For us, it's ""How do you build from there, using what we're building behind the scenes?"" Lukas: Were you surprised how well these approaches have worked to generate images? It sounds you started your work in 2017, 2018. The space has changed so much. Do you feel you saw it coming, or have things unfolded differently than you thought? Cris: I mean, things have definitely accelerated. But I think our thesis β€” when we started Runway three and a half years ago β€” was pretty much the same. It was, we're entering literally a new paradigm of computation and content. We're not going to be...we're soon going to be able to generate every single piece of content and multimedia content that we see online. I've been demo-ing generating models for creative use cases for the last three years. What I was showing three years ago, people were like...it was like, ""Hey, this is how it works. This is how you train a model. This is what the outcome of the model is."" Of course, at that time, it was a blurry 100x100 pixels image. Some sort of representation of what you were describing. Most people took it as a joke, like, ""Oh yeah, cool. Very cool. Cool thing."" Or as a toy, like, ""That's a fun thing, right? You kind of use it once. But of course, I will never use this in production."" I remember speaking with this huge...one of the biggest ad agencies in the world, and I was presenting to other executives. Here's the future of content, type anything you want. And something blurry came out and they're like, ""Cool, not for now."" And they reached three weeks ago being like, ""Hey, how many licenses can we get for this, tomorrow?"" Because the models are going just so much better, that it's obvious. It's transforming their industries and a lot other things. I think what has changed for us is pretty much the speed. Now we're entering a really nice moment where things are converging, and there's a good understanding of what's going to be possible, and where things are going. Scaling laws are getting to a good point. And so continuing the same, but the thesis of the company was always built on that this will happen, and it's happening sooner rather than later. Lukas: Do you have a perspective on if this acceleration will continue, or if we just are seeing a breakthrough, and then we're going to need new breakthroughs to get to the next level of quality? Cris: Sure. I think there's definitely more compute that needs to be added to this, more data sets. I think we're still scratching the surface of what it will become. There's still this...I was discussing this with a friend the other day, this idea of a curiosity phase where people are entering the realm of what's possible and coming up with all these solutions and ideas, but there's still a difference between those concepts, and explorations, and ideas and meaningful products that are long-term built upon those. What I'm interested in seeing is how much of those ideas will actually convert over time, over meaningful products. I think that conversion of products is not just pure research or pure new models, there needs to be a layer of infrastructure to support those things. It's great that you can run 1 single model to 1 single thing on X percent. But if you're trying to do that scale on a real-time basis for 10 people, that then use it on a team and depend on it for their work, then there's a slightly different thing. But I think we're about to see way more stuff around video, specifically. I think image might be solved in a couple of more months and video is starting to now catch up with that. It's a really exciting time for that. Lukas: What does something being solved mean to you? Like, you could just get any image that you would ever want or imagine? Cris: Yeah, that's a good one. That's a good question. I would say that I would consider being solved [as] being able to translate something like words or a description into a meaningful image or content that pretty much matches where you're trying to...what you're imagining. And if it doesn't, you're able to control really quickly and easily to get to the point where you can arrive at your final idea. That's why the combination of models really makes sense. It's going to be hard to have a full model that does exactly what you want. For instance, for image generation. I think it's a combination of, you have a model that does the first model, which is you generate something. There's no pixels, you generate the pixels. Second step is, you're able to quickly modify it, or inpainting, or grade it in some way, and start it in some other way. But that whole thing just happens in a few seconds or a few minutes, right? If you speak with anyone in the industry, VFX, or ad agencies or content creation, post-production companies, these are stuff these guys do all the time. This is what they do for a living, right? They're able to create content out of nothing. The thing is just it's really expensive. It's really, really expensive. And it involves a lot of time and rendering and skilled people to get to that point. I think for me, ""solved"" is, anyone can have access to that professional-level grade VFX-type of content from their computers and from a browser. Lukas: Do you ever think about making a version of Photoshop, instead of a video editing software? If you think images are closer to being solved. Certainly I can't go into Photoshop and get exactly the image I want. I love to play with all the image generation tools out there. But I do think they're amazing at first, but then you kind of hit this point where if you really want the image to look like you want, it gets kind of frustrating. It seems there's also room for an image version of what you're doing. Is that something you'd consider doing? Or, why not make that? Cris: Totally. Yeah. The answer is absolutely. I think, a few things. One, I think we're converging more to this idea of multi-modal systems where you can transfer between images, and videos, and audio. I think the idea that we've been...we built software to deal with each media independently. There's audio editing software, and video editing software, and image editing software, and text-based...you have models that can quickly translate between all of those. Content β€” let's say video β€” it's a combination of different things. You have images, you have videos, you have audio, you have voice. All of those things are now possible. I think for us, when I think about the product philosophy of Runway, it's less about, ""How do you build a better Photoshop or a better Premiere?"" Fundamentally, these models are just allowing you to do the things that none of those others can do. If you think about marginal integrations of those things...yeah, you build a better Photoshop that has a better paintbrush, or a better contact server tool. But ultimately, when you combine them in new ways, you create a new thing. It's completely new. It's not Photoshop, it's just a new way of making videos, and editing images, and editing audio. All in one, single component or tool. For me, what's really interesting is the multi-modal aspect of things, and translating also into those. And 3D, for instance, it's one of the filters...you're going to start to see a lot of translation between images and videos on 3D. Lukas: Totally. So, I have to ask you your thoughts on deep fakes and things like that. I'm sure everyone asks you that, but I'm really curious what you think about that. Do you think that you would want to put in limitations into your software to not allow certain things? Do you think this is about to change the way we view videos, as this technology gets more standardized and available to everyone? Cris: For sure. As [with] every major technology breakthrough, there's always social concerns about how it might be misused or used in not the right, intended ways. It's a good exercise to look at history to see what has happened before. There's this really good YouTube video about Photoshop when it was first released, I would think about the early 90s. They were like...it's kind of a late night show, and they're discussing the ethical implications of manipulating images in magazines. And they're like, should we allow to manipulate images and put them in magazines? Half of the panel was like, ""No, we shouldn't."" It breaks the essence of what photography is, right? 20 years after that, it makes no sense to think about not doing something like that, right? There's always an adaptation process, I would say, where people need to...we need to collectively ask, ""Hey, how is it going to be used?"" But I think ultimately, you understand what the limitations are, and you also fine-tune your eyes and your understanding of the world to make sense of that thing. Now everyone knows that ""Photoshop"" is a verb that you can use to describe something that's manipulated. You do that same exercise, and you go back in time, and you see the same. When film just started to appear, there was this story, interesting story about...one of the first films that were made is a train arriving to a station. They were like, projecting that on a room. When people saw the train coming to a station, everyone ran away because they thought a train was coming to a station, literally. But then you make sense of it, and you're like, ""Yeah, this is not true. I understand that this is an actual representation of something."" Ultimately, I think with AI and with generated content, we'll enter a similar phase, where it's going to become commonplace and something people are familiar with. Of course, there's going to be misuses and bad uses. Of course, people can use Photoshop for all sort of evil ways. But the 99% of people are just like, their lives have been changed forever in a positive way because of this. Lukas: Interesting. Well, look, I'd love to hear more about your tech stack. This is a show for ML nerds of all types. I think you're doing pretty hardcore ML at scale. What have been the challenges of making this work, making the interface as responsive as it was? What were the key things to scale up your models? Cris: Sure. There's a lot of things that we had to kind of come up [with] creatively, to make this work in real time. On the one hand β€” on the ML side β€” we mostly use PyTorch for all of our models. We have a cluster β€” basically, an AWS cluster β€” that scales based on compute and demand, where we're running all those models for training. We use sometimes Lighting and, of course, Weights & Biases to follow up and understand better what's working in our model training. Serving, we optimize for different GPU levels or compute platforms, depending on availability. We've made some systems to scale up depending on demand. On the frontend side of things, everything's Typescript and React-based. There are some WebGL acceleration stuff we're doing to make things really smooth. And then the inference pipeline, where we're writing everything in C++ to make it super, super efficient and fast, specifically since you're decoding and encoding videos in real time. We also built this streaming system that passes frames or video frames through different models to do the things that I just showed you. And so we also had to come up creatively with that. That's kind of a big picture of our tech stack. Lukas: One challenge that I'm seeing some of our customers run into β€” as these models kind of get bigger and more important β€” is that the actual serving cost of the application increases. Is that an issue for you? Do you do things like quantization? Is lowering your inference costs an important project for you all? Cris: For sure. Yeah, for sure. I mean, we're running...our biggest cost right now is AWS, GPU costs, and inference costs, and serving these models. There are two main areas for sure. We have an HPC, we're doing large-scale training of language models and video models. That takes a lot of resources and time. But just serving on...I would say the tradeoff between precision and speed really matters. Quantizing models is great. But also you need to make sure that you're not affecting the quality of the model because if you're affecting something on a pixel level, it might change the result from being okay to bad. And that might mean user churning. And so, if you're going to spend a few more seconds rendering, that might actually be better. There's always a tradeoff of how much. But yeah, we always try to figure out what's the right balance there. We're still exploring some stuff on the browser. I think the browser is becoming really powerful. The only constraint about the browser is just memory and RAM. And you get...it's a sandbox, so you can't really do a lot of things specifically with video. But you can run some stuff on the browser. And so we would send some things specifically, and convert some things, and make them smooth enough. But I think we're not 100% there yet. Lukas: But you're also training your own large language models and large image models. That sounds like training would be a major cost for you as well. Cris: Yeah, for sure. Retraining some stuff to make sure it works in the domain of what we have is one of our core competences. Now we're training...starting a huge job on our HPC. That's going to take a big percentage of our costs for the next few months. Lukas: Wow. I have to ask. That language interface that you showed me was so compelling and cool. But I have been seeing language interfaces for the past 20 years, and the challenge with these language interfaces is when they don't work, they're just enraging. Actually, you sort of addressed that. Showing how it creates these things, and you can undo them, and you can kind of modify them. Do you feel that that kind of conversational interface is at the point where, for you, it's an interface that you really want to use? Cris: I like to think [of] it as a tool. It's not the sole answer to everything you need. This is not going to be a replacement for all of the workflows in making content, video, images, or sound, or whatever it is. It's just a speed up in the way you can do those kind of things. I think the sweet spot is a combination of both. Being able to have that constant feedback loop with the system, where you're stating something out [and] the system is reacting in some way that matches your idea. And then you have that level of control so you're going the direction you want and doing what you want. Or, if it's not working, you just do it yourself, right? I think a big mistake of research β€” specifically in the area of computer creativity β€” is this idea that you're going to automate it entirely. You see one-click off solutions to do X, Y, or Z. I think that's missing the bigger picture of how most creative workflows actually work. That probably means that you've never actually worked with an agency where the client was asking you to change things every single hour, or make it bigger, make it smaller, right? Lukas: Right. Cris: It's hard for me to imagine a world where you have a one-click off solution for everything. That feels boring, to be honest. You want to have that control. I think language interfaces are a huge step towards accelerating the speed at which you can execute. Are they the final answer for everything? I'm not sure, but they do make you move faster on your ideas. Lukas: Did I understand you right that you want to build your own large language model? I would assume you would take one of the many off-the-shelf language models today. Are you actually training your own? Cris: Yeah, I think it's...we are, but it's also the fact that ML...the infra for models and models themselves are becoming commodities. It's great for companies like us, because some stuff we kind of need to build on our own. There's a lot of things in Runway that you won't find anywhere else. But there's a lot of stuff, large language models that you can just use off the shelf. You have all these companies offering similar services. It's a great...as a consumer of those, if we want to use those, it's just a cost situation where whoever offers the best model, we'll use. And to a point, it might make sense to do our own. So yeah, sometimes we don't have to do everything ourselves. You can just buy it off the shelf. But some other times, you just need to do it because it doesn't exist. Lukas: Sorry, large language models you think you might do it yourself, even? Cris: We're doing a combination of both. We're using APIs but also re-training some of our own. Lukas: I see, I see. Have you experimented with all the large models out there? Do you do you have a favorite of the existing offerings? Cris: I think GPT-3 works. I think, actually, the model is Davinci. It's probably GPT-4 by now. I think OpenAI has been making- -right, right. Cris: -that silently behind the scenes, it works really well. That's the one I'd say we're experimenting with the most, and we get the best results. Lukas: Cool. Well, look, we always end with two questions. I want to make sure I get them in. The second-to-last question is, what is a topic that you don't get to work on, that you wish you had more time to work on? Or, what's something that's sort of underrated for you in machine learning right now? I realize it's a funny question to ask an obsessed ML founder. But I’ll ask it anyway. Cris: I think, audio generation. I think it's catching up now, but it's not...no one really has been paying a lot of attention. There's some really interesting open source models from Tacotron to a few things out there. I think that's going to be really, really transformative for a bunch of applications. We're already kind of stepping into some stuff there. But, it's hard to focus as an industry β€” or as a research community β€” in a lot of things at the same time. And now that image understanding has kind of been solved away, people are moving to other specific fields. I think one of the ones that are going to start seeing very soon is audio generation. So yeah, excited for that for sure. Lukas: Yeah, I totally agree. Do you have a favorite model out there? We just recently talked to Dance Diffusion, or HarmonAI, that was doing some cool audio generation stuff. Cris: Yeah, there's one β€” let me search for it β€” that just blew my mind. tortoise-tts, I don't know if you've seen that one. Lukas: No. Cris: Yeah. tortoise-tts is, I think, the work of just one single folk, James Betker. It works really well and he's been...someone used it to create the Lex Fridman...generative podcast. I'll share with you the audio. It's a whole podcast series that goes every week, where everything is generated. The script is generated by GPT-3 and the audio is generated by tortoise. And you can hear it's like, it's a podcast. You can't really tell. Yeah, really excited for stuff like that. Lukas: Cool. The final question is for you, what's been the hardest part about getting the actual ML to work in the real world? Going from these ideas of models or research to deployed and working for users. Cris: I think these models β€” and things like image generation and video generation β€” require a different mental model of how you can leverage this in creative ways. I think a big mistake has been to try to use existing principles of image or video generation and patch them with this stuff. I think, ultimately, you need to think about it in very different ways. Navigating a latent space is not the same as editing an image, right? What are the metaphors and the abstractions they need to have? We've come up with those before, in the software pipeline that we have right now. You have a brush, and a paint bucket, and a context or world tool, and you're editing stuff. But when you have large language models that are able to translate ideas into content, and you navigate and move across specific space or vector direction in ways you want, you need new metaphors and you need new abstractions. What's been really interesting and challenging is, what are those metaphors? What are those interfaces? How do you make sure the systems you're building are really expressive? I think two things that drive a lot of what we do are control and expressiveness. ""Control"" as in you, as a creator, want to have full control over your making. That's really important. How do you make it, so you also are expressive? You can move in specific ways as you are intending to do. So yeah, that's also really...it's really exciting and passionate for us to invent some of those stuff. Lukas: Well, it’s really impressive what you did. Thanks so much for the interview. Cris: Of course, thanks so much for hosting me. Lukas: If you're enjoying these interviews and you want to learn more, please click on the link to the show notes in the description where you can find links to all the papers that are mentioned, supplemental material and a transcription that we work really hard to produce. So check it out.",6898,"In this podcast, Cris Valenzuela, CEO and founder of Runway, discusses the capabilities of their cloud-based video editing tool that uses machine learning algorithms and systems. The tool allows for real-time video segmentation and object manipulation, and can be used for end-to-end content creation and editing. Valenzuela also discusses the potential for multi-modal systems that can transfer between images, videos, and audio, and the challenges of using machine learning at scale for a real-time image and video editing tool. The podcast also touches on the combination of human creativity and machine learning in computer creativity, and the ethical implications of deep fakes and other manipulated content.
6
+
7
+ The podcast provides insights into the evolution of Runway, the challenges of deploying machine learning models in the real world, and the potential of audio generation in machine learning. Valenzuela believes that a constant feedback loop with the system and having control over the direction of the project is important, and that language interfaces are a huge step towards accelerating the speed at which ideas can be executed. The podcast provides links to show notes, papers, and supplemental material for further learning."