
From Data to Decisions: Stanford’s Jure Leskovec on Foundation Models for Enterprise Data
Insights
- Foundation models must evolve to learn from enterprise-specific, semi-structured data to deliver meaningful business insights and competitive advantage.
- As generative AI commoditizes skilled labor, proprietary data becomes the new differentiator—placing data strategy at the center of enterprise transformation.
- Industries like financial services must navigate both data complexity and regulatory trust requirements when deploying AI at scale.
How can foundation models unlock real enterprise value from internal data?
Recorded at The Business and Economics of AI workshop co-hosted by Stanford University and Infosys on May 14, 2025, this thought-provoking interview features Jure Leskovec, Professor of Computer Science at Stanford University and expert in foundation models and AI systems.
Jure explains why the next frontier for enterprise AI lies in reasoning over internal, semi-structured data—customer records, financial ledgers, supply chains—and how foundation models can be trained to understand and act on this uniquely valuable information.
Key takeaways include:
- Why GenAI is commoditizing skilled work—and shifting competitive advantage to proprietary enterprise data
- The challenges and opportunities of modeling complex, heterogeneous datasets in regulated industries like financial services
- What business leaders need to consider when building trustworthy, data-grounded AI systems
A compelling perspective for AI, data, and transformation leaders looking to move from experimentation to strategic differentiation.
Jure Leskovec:
I'm Jure Leskovec and I'm professor in computer science department in the AI lab at Stanford University.
Can you summarise the key focus of your session?
I was talking about foundation models for enterprise data. The key point is that enterprise data is usually organized in data warehouses in this semi-structured tabular form. And that we need foundation models who can learn over this type of structured, semi-structured data to be able to make effective decisions on top of this data. And these models need to be grounded in those data because that data is what is kind of the most valuable data of every enterprise.
What does AI commoditize, and what new constraints does it introduce?
The raise of GenAI, large language models, and so on, what is it commoditizing? It's basically commoditizing skilled work. So the constraint that is popping out now is, how do we get value out of our own data? Because that's what makes each of our businesses unique. If we are all using the same skilled work on the other side, we are all the same. So what is the differentiator now? It's not the work. It's actually the information, the data, and how do we extract value from the data. How do we make accurate decisions based on our internal enterprise data? When I say internal enterprise data, it would be customer records, transaction records, financial ledgers, product catalogs, supply chain records, all these kinds of data that is used to basically make decisions.
Do financial services face unique challenges in extracting value from their data?
There are several, I would say, unique aspects. First, data in financial organizations is very complex and usually very large, right? Because you have multiple touch points with your client, with your customer, all from, you know, at the kind of other transaction level to business interactions, person-to-person interactions and so on. So I think that complexity and that heterogeneity is both interesting, but also very valuable so that we actually can build AI that can learn from all these heterogeneous interactions to give us value. I would say that's the example of uniqueness. And then, of course, I think the level of trust that we need in these models because it's a regulated industry is also very, very important.
What stood out in your conversations with business leaders today?
I think what is interesting talking to them is the diversity of the problems they have and the fact that all these problems require them to bring together and reason over this heterogeneous multi-touch point long-term type data.