Why Does MCP Matter? What is Google's Strategic Approach to MCP?

Community Article Published October 13, 2025

Screenshot 2025-10-13 at 11.46.09

David Smooke: Why does MCP matter? What is Google's strategic approach to MCP? And more specifically, what is the Data Commons strategic approach to MCPs?

Prem Ramaswami, the Head of Data Commons at Google: MCP creates an open, standardized way for AI agents and applications to access data sources. As AI systems become more prevalent, the reliability and transparency of their outputs depend on how well they can ground their answers in real data – and Data Commons delivers that real data with the benefit of MCP. Rather than having to know the ins and outs of our API, or our data model, you can use the “intelligence” of the LLM to help interact with the data at the right moment.

We believe that an open ecosystem, where multiple organizations contribute and adopt shared standards, leads to better quality, more reliable AI applications, and broader societal benefit.

We have built an MPC server, making our vast repository of public data easily accessible to AI models, and collaborating with partners to set best practices. Our goal is to empower developers, NGOs, journalists, governments, and anyone who needs reliable data, while building a foundation of trust and transparency into the next generation of AI-powered tools.

Why did Google choose to build on Anthropic's open-source MCP standard rather than create its own? What was the internal debate like re an a proprietary vs. open source protocol? And how did Google owning a 14% stake in Anthropic impact the decision?

As a small team working on an open source effort, Data Commons' goal is primarily to ensure broad interoperability and accelerate the development of reliable, data-grounded AI applications. In addition to MCP, we’ve recently integrated the Statistical Data MetaExchange (SDMX) format and most of our ontology is an extension of Schema.org, another open web standard. Many Google products including Google Cloud databases like BigQuery and industry products have already integrated to MCP, making it an easy choice.

What are the unit economics here? How expensive is a query? Is this a free product forever, or are there future plans for a paid tier based on usage? What is to prevent Data Commons from being sunset in the future?

Data Commons is open source so we hope for a thriving community of users and developers to help it grow and Google has shown commitment to that success. Currently, Data Commons is focused on maximizing access. Data Commons helps provide data to Search and separately is actively researching different ways we can make LLMs more reliable and trustable. It is also free to use at DataCommons.org.

One of the clever aspects of MCP is that our users can use their own LLM to interact with the MCP server. Said differently, the user’s LLM is what translates the human language query into a set of API calls and then interprets the result back to the user. Google’s compute isn’t involved!

I should note, we do set a cap on the number of API requests to Data Commons, we want to encourage broad use, but also want to ensure there isn’t abuse or pure scraping.

What techniques does the Data Commons API use to make its data cleaner, more structured and more accessible than the average public data dump? And what general advice do you have for usefully structuring unstructured data?

One of our key innovations is to transform data into a common knowledge graph. We import raw public data from thousands of sources into a single, canonical ontology - if one column in one dataset says “Type 2 diabetes” and the other column in another dataset has the ICD code “E11” we can understand they are both referring to the same thing. We can also normalize units to make them more easily comparable. This allows data scientists to focus on data analysis instead of the busy work.

Every data point is accompanied by detailed metadata and provenance, so users always know where information comes from. Focusing on these principles allows users to turn disparate data into valuable, actionable resources.

What types of verticals and companies do you see using this MCP server to grow their business? And what specific datasets are you most excited to see developers build on, and why?

Many of the world’s pressing challenges are holistic problems. In other words, I can’t just look at one dataset from one government agency but need to combine multiple datasets.

As an example of this, the One Campaign recently launched the ONE Data Agent, an interactive platform for health financing data. This new tool enables users to quickly search through tens of millions of health financing data points in seconds, using plain language. They can then visualize that data and download clean datasets, saving time while helping to improve advocacy, reporting and policy-making.

I’m excited to see developers build new understanding on datasets imported into Data Commons in public health, climate, economics, education, and many other fields. These are foundational datasets that, when made more accessible and actionable, can drive real-world impact—helping communities measure progress and hopefully more clearly understand which interventions lead to which outcomes. It can help us achieve sustainability goals, spot economic changes early, or supercharge advocacy organizations. The MCP server lowers barriers for innovators in these fields, and I’m eager to see the creative solutions that emerge.

How do you define "trustworthy" data in a way that is verifiable and auditable for a developer building an application on top of your platform?

For us, “trustworthy” data is from authoritative and reputable organizations such as government agencies, academic institutions, and civil society groups. Every data point on our platform is accompanied by detailed metadata, including its original source.

For developers, this means you can always trace any number or statistic back to its origin, review the context in which it was collected, and understand any limitations. Our platform surfaces this provenance transparently through the API, making it easy to build applications that not only deliver answers, but also provide users with the evidence and audit trail behind every result.

We don’t try to place judgement on those datasets or specific values. Instead, we want potential disagreements in this data to be more easily visible. Each one of these differences is another story to be told.

The industry has a problem with AI "hallucinations." Is Google's long-term bet that the future of credible AI will be built on verifiable data layers like Data Commons, rather than on models with ever-larger training sets?

Not at all. We are very early in our work with LLMs. Google’s transformer paper was released in 2017! At the moment, I believe the answer to hallucinations is to try all of the above.

Data Commons is attempting to ground outputs in verifiable data with provenance.

Our long-term bet is that the most reliable AI systems will combine the strengths of these models with robust, auditable data sources. By making it easy for AI agents to reference authoritative data sources, we can deliver answers that are trustworthy, transparent and reliable.

What's on the Data Commons MCP Server roadmap for next year? Are there specific data sources or capabilities you're planning to add that developers should be excited about?

If you asked me to tell you my roadmap 9 months ago, I wouldn’t have been talking to you about MCP! The rate of development and change right now in the AI space is dizzying. That said, a few areas we will focus on:

Currently, Data Commons data has a lot of depth and coverage in the U.S., then India, then OECD countries, and then the coverage thins out, a gap the team is now aggressively working to close. One of our goals is to work with more national statistical agencies, international organizations, and civil society organizations to both capacity build on the creation of the data and source that data to make sure the grounded AI systems we build are more globally representative.

We want Data Commons to be easier to use. For example, we’ve recently worked to make it compatible with the Statistical Data Meta Exchange format (SDMX) and hope to continue to increase the ability of Data Commons to work with different open standards more seamlessly.

Read the Full Interview on HackerNoon:

"We Are Very Early in Our Work With LLMs," - Prem Ramaswami, Head of Data Commons at Google

Community

Sign up or log in to comment