Talk · 2026
Data-first or Contract-first?
Jochen Christ (CTO & Co-Founder, Entropy Data) · June 11, 2026
When you build a data product, where do you start? Jochen walks two routes to the same destination. The traditional data-first path pulls production data, explores it, and ships what is there. The contract-first path starts with a conversation, writes the contract before the data exists, and lets a coding agent build the product from it. Along the way: data contracts, the Open Data Contract Standard, the Data Contract CLI, and a live build of the same shelf-warmers product that runs in about ten minutes.
Annotated slide notes from Jochen Christ's conference talk. The text below is an edited summary.
Q&A
Selected questions from the audience after the talk.
Q: If the agent generates the dbt models, who maintains them afterwards? Do I edit the dbt code, or go back to the contract and regenerate?
It depends on your culture and engineering maturity, but personally I would not look too much into the dbt code anymore. I would change the requirements in the contract, and if the output is not right, improve the skills. Build a feedback loop into your skill repository and optimise from there, rather than hand-patching generated code.
Q: Do you distinguish a marketplace from a data catalog, or treat them as the same?
We distinguish them. A catalog is a technical index of every dataset that exists, often millions of assets. A marketplace only holds data products that are meant to be shared and consumed by other teams, usually a few hundred. The marketplace is where the contextual, contract-level information lives.
Q: Data-first often surfaces insights nobody asked for. Doesn't always going contract-first mean you only ever answer what people already know to ask?
A fair point, and you do want to keep explorative capability. The key is the boundary: inside your own domain, where you already understand what the data means, keep simple explorative, data-first access. Data contracts matter when you cross an organisational boundary, because that is where a new bounded context begins and shared data needs an agreed, documented interface.
Q: Should the contract specify the physical schema up front, or just the purpose and conceptual model, and let the agent figure out the physical schema?
In the contract I would define the conceptual and logical model and leave the physical, technical implementation to the skill or the target data platform. The business knowledge goes into the model; the platform-specific detail goes into the skills.
Q: We see data contracts at the edges of the data flow, on sources and before consumption. Do you see them used in the middle, on every step of the lineage?
My view is that a data contract provides a new boundary of trust. You probably do not need a contract back to the very first source on every hop; you need lineage back to the last data contract you rely on. Between contracts you can use technical lineage, which is the internal wiring of a data product's pipeline. Trust is anchored at the source contract.