SaaS 2.0

Building Agents On Data Lakes

Mar 10, 2025

Hey friends, I’m Akash! Software Synthesis analyses the evolution of software companies - from how they're built and scaled, to how they go to market and create enduring value. Join thousands of founders, operators and investors that are building, scaling, or investing in software companies. Reach me at akash@earlybird.com.

SaaS 2.0

We’ve discussed in the past how the future of application software may be powered by apps sitting on top of data lakehouses - this week, Klarna CEO Sebastian Siemiatkowski revealed what churning from systems of record like Salesforce really looked like.

While people nowadays can criticize things like Wikipedia, we also reflected on the fact that it is a remarkable achievement—having over 20,000 people collaborate on the largest graph of knowledge that is still fundamentally of high quality, accessibility, and accuracy. What could we learn from this?

A Swedish company, Neo4j, and Emil Eifrem, introduced us to the beautiful world of graphs.

We further explored data modeling, ontology, and, of course, vectors, RAGs, and many things.

Key to our explorations became the conclusion that the utilization of SaaS to store all forms of knowledge of what Klarna is, why it exists (docs), what it tries to accomplish (slides, tickets, kanban boards), how it is doing (sheets, analytics), who is it dealing with (CRM, supplier management), who works here (ERP, HR) and what it has learnt was fragmented over these SaaS—most of them having their own ideas and concepts and creating an unnavigable web of knowledge that required a tremendous amount of Klarna specific expertise to operate and utilize.

We also recognized that enterprise software has a standard set of features that are vital for it to operate—features such as audit, versioning, access and edit management, and similar universal needs. We need them as well, but that fragmentation again adds friction, admin overhead, and more.

So, we decided to start consolidating; to put things together, connect our knowledge, and remove the silos. The side consequence of this was the liquidation of SaaS—not all of them, but a lot of them. And not for the license fees, even though those savings have been nice, but for the unification and standardisation of our knowledge and data.

Sebastian goes on to say that Klarna’s case may not be representative of wider IT purchasing, but that there may well be a consolidation of point solutions - the latter point of course aligns with companies going multi-product earlier and faster as their R&D efforts get leverage from coding agents.

The more important kernel here is the notion of a unified data lake that stores unstructured and structured enterprise data and that is easily consumable - previously by humans, and now by agents.

Many AI agent startups are architected this way from inception.

Take Rox, for example, a company that

Packy McCormick

profiled brilliantly.

Rox (for Revenue Operating System) is founded on the premise that best-in-class GTM teams build an in-house stack on top of their data warehouse, where as much as 40% of data is customer data.

Rox is first a data infrastructure company, and second an AI application/agent company.

Rox’s data lake combines structured data (CRM records, product usage logs, ticketing systems), unstructured data (news articles, blog posts, call transcripts), and semi-structured data (emails, Slack messages, API logs), and constructs a knowledge graph that is updated in real-time as events occur (at scale). This knowledge graphs underpins a swarm of agents that meet AEs wherever they are - mobile, desktop or even notifications during meetings - with a context-rich information to drive better engagement and results.

The R&D that forms the foundation of the product is real-time data pipelines handling, LLM-based scoring, a knowledge graph construction and storage system, agent evaluations and monitoring, and more. Knowledge graphs are of course continuing to gain traction beyond initial adoption driven by RAG, as they emerge as a means to capture reasoning processes as well.

These underlying AI engineering and infrastructure investments are prerequisites to delivering the promise of agents.

Yaz at Emergence Capital recently said to me that the great AI application companies of the future will look like the infra companies of the past, when it comes to team composition and R&D. I tend to agree.

The rhetoric about ‘wrapper’ companies has inverted over the last 2.5 years as it’s become abundantly clear that off-the-shelf models are only as good as the data they have access to. For most AI applications, this means hoovering up as much customer data as possible and instrumenting the protocols for injecting this data into models for performance. For some AI applications, there’s the further step of combining this first-party data with third-party data. In the case of Rox, this means building scalable data pipelines that:

Needs to refresh data for over 100k companies daily, with thousands of companies added each day, ensuring that every public artifact is scraped and processed

A data asset can be constructed by either building robust data extraction pipelines for public data (a data engineering challenge) or business development, i.e. striking licensing deals (something all of the foundation models are doing and where there are economies of scale to be had). ZoomInfo, for all it’s woes, only relies on public data for 5% of its data assets - the rest is proprietary.

Another emerging data asset unique to AI companies is evaluations data - companies that are able to run rigorous evaluations against a comprehensive dataset can produce better results in production.

Data assets combined with AI engineering (and in most cases, opinionated UI/UX) are a potent combination that ought to help the best executing teams reach escape velocity and potentially attain some form of a moat in the SaaS 2.0 era.

Jobs

Companies in my network are actively looking for talent:

An AI startup founded by repeat unicorn founders and researchers from Meta/Google is building 3D foundation models and is looking for a ML Training / Inference Infrastructure Engineer with 3+ years of experience in a cloud-related role, preferred ML-related (London or Munich).
A Series A company founded by early Revolut employees is building a social shopping platform and is looking for Senior Backend, Frontend and Fullstack Engineers and Business Development Managers across Europe (London, Toronto, France, Germany, Spain, Italy, Remote).
A vertical AI startup tackling the accounting industry is looking for a founding GTM hire (London).

Reach out to me at akash@earlybird.com if you or someone you know is a fit!

SaaS 2.0

Building Agents On Data Lakes

SaaS 2.0

Jobs

Discussion about this post