Who is speaking: Dr. Simon Harrer — software engineer into data, co-founder and CEO of Entropy Data, maintainer of the Data Contract CLI and data-landscape.com, Bitol TSC member, OSI contributor

The Speaker

Simon Harrer describes himself as a software engineer into data: he builds tools that help data, people, and agents work together. He co-authored "Java by Comparison," translated Zhamak Dehghani's "Data Mesh" into German, and authored the Data Mesh Architecture website.

He maintains the open-source Data Contract CLI and data-landscape.com, sits on the Technical Steering Committee for the Bitol standards (ODCS & ODPS), and contributes to the Open Semantic Interchange (OSI).

His day job is Entropy Data — a data product marketplace built on data contracts and semantics, making high-quality data available to agents and humans alike. ISO 27001 certified, six product engineers, customers in the US, EU, CH, and AU. No VC, and profitable despite the tokens.

The Plan

Three movements. First, peek into the future — watch a coding agent build a data product from nothing but a contract. Then learn the standards that make that possible, the unglamorous plumbing that turns a demo into something you can run in production.

And finally, make history — because these standards are still being written, and the room can help decide whether they stick.

Building data products upon open standards: a coding agent fed a Data Contract (ODCS) and Data Product (ODPS) YAML plus a prompt, given Skills for how and Tools for with-what, produces a Data Product

A Glimpse Into the Future

The framing question: what if we already were using open standards for data products? Simon's answer is to build one live — with a coding agent and nothing hand-written.

The recipe is four ingredients around a coding agent (Claude Code, Codex, Copilot). The what comes from machine-readable specs — a Data Contract in ODCS YAML, a Data Product in ODPS YAML, and a prompt. The how comes from Skills that encode your conventions. The with what comes from Tools connecting the agent to your systems.

Out the other side: a real data product — a dataset, a dbt project, a Databricks Asset Bundle. The rest of the talk is the standards under each of those boxes.

What is Data Mesh? The four principles — domain ownership, data as a product, self-serve data platform, federated governance — across strategic, socio-technical, and technology layers

Where Data Products Come From

A quick grounding: the data product is the second of the four Data Mesh principles — domain ownership, data as a product, the self-serve data platform, and federated governance.

"Data as a product" means applying product thinking to data: clear ownership, a consumer mindset, and interoperable interfaces. That last word — interoperable — is exactly where open standards earn their keep.

The Data Product Canvas from datamesh-architecture.com, laying out domain, output ports, data contract, consumers, use cases, and ubiquitous language

Data Product Architecture: a data product with input ports, output ports, a discovery port, and inner building blocks like ownership, transformation code, tests, storage, policies, CI/CD, and observability

What a Data Product Actually Is

Before standardizing anything, you need a shared picture of the thing. The free Data Product Canvas captures the purpose, the output ports, the data contract, and the consumers — the conversation you have before writing code.

Architecturally, a data product is a self-contained unit with input ports, output ports, and a discovery port for metadata, wrapping the transformation code, storage, tests, policies, CI/CD, and observability inside. Each of those edges and pieces is a candidate for a standard.

That was the future.

Now let's learn the standards.

What is a Standard? And why do we need them?

The xkcd comics on standards: a public service announcement about date formats, deprecated counting habits, and 'how standards proliferate' ending with 15 competing standards

What Is a Standard — and Why Bother?

A standard is just an agreement on how to do something, so that different people and tools can interoperate without negotiating it each time. ISO 8601 dates (2013-02-27) are the canonical example — one format everyone can parse.

And the canonical risk is the obligatory xkcd: someone looks at 14 competing standards, decides to unify them, and now there are 15. Avoiding that outcome is the whole game in the data product space right now.

Four dimensions of a standard: De Facto vs De Jure (usage), Initiative vs Vendor (owner), Many vs One (control), Open vs Paywall (cost)

Four Dimensions to Judge a Standard

De facto vs. de jure — is it standard because people actually use it, or because a body declared it so? (Usage)
Initiative vs. vendor — is it stewarded by a neutral community, or owned by one company? (Owner)
Many vs. one — is control spread across contributors, or held by a single party? (Control)
Open vs. paywall — can anyone read and adopt it for free, or does it cost money? ($$$)

The standards worth betting on lean left on all four: used in the wild, community-owned, broadly governed, and open. Keep this scorecard in mind for everything that follows.

The Data Landscape at data-landscape.com — an opinionated, interactive map of the open standards organized by what they describe: contracts, data products, schema, and semantics

A Map of the Open Standards

To see the whole field at once, Simon maintains data-landscape.com — an opinionated, interactive map of the open standards that power a modern data architecture, organized by what they describe: contracts, data products, schema, and semantics.

Inspired by the CNCF Landscape and the ThoughtWorks Tech Radar, each standard is rated adopt / situational / assess / caution. The rest of the talk walks the three columns that matter most for data products: contracts, products, and semantics.

Standards for Data Contracts

"Think of an API — but for data."

A data contract diagram: a data producer owns the contract, the consumer trusts it, it specifies the data, and the consumer accesses the data. Definition: a document that defines ownership, structure, semantics, quality, and terms of use.

What Is a Data Contract?

A data contract is a document that defines the ownership, structure, semantics, quality, and terms of use for exchanging data between a data producer and their consumers. Think of an API, but for data.

The producer owns the contract and provides the data; the consumer trusts the contract and accesses the data. Everything the consumer needs to rely on is written down — and, crucially, machine-readable. (More in What is a Data Contract?)

Once upon a time, every company had their own format

The Great Data Contract Format Merge timeline 2022-2026: the DCS and ODCS lineages converging into a single Open Data Contract Standard under Bitol at the Linux Foundation

The Great Format Merge

Once upon a time, every company rolled its own data contract format — the exact xkcd trap. Then two lineages emerged: the Data Contract Specification (used widely thanks to the Data Contract CLI) and ODCS (donated by PayPal as a template).

Rather than make it 15 standards, the two camps merged. ODCS found a neutral home at Bitol in the Linux Foundation, Simon joined the TSC, and the Data Contract CLI now fully supports ODCS. One standard to rule them all — de facto and de jure.

Open Data Contract Standard v3 anatomy: fundamentals, schema, data quality, pricing, team, security, SLA, infrastructure, support, business rules, and custom properties, governed under Bitol / LF AI & Data

datacontract.com walkthrough: the Fundamentals section of an ODCS YAML — apiVersion, kind, id, name, version, status

datacontract.com walkthrough: the Data Quality section of an ODCS YAML, with library checks and custom SQL that the Data Contract CLI can run

The Open Data Contract Standard

ODCS v3 is the building block. One YAML file carries the fundamentals (version, name, status), the schema (tables, columns, semantics), data quality rules, team and ownership, terms of use, SLAs, servers, and arbitrary custom properties.

Each section is just a few lines of readable YAML — fundamentals first, then schema, then quality rules that can be executed automatically. It is enough for a human to understand the data and enough for a machine to act on it. (Background: Open Data Contract Standard.)

Okay, we've created that YAML. Now what?

Automate all the things: code generation, test, metadata distribution, infrastructure provisioning, collaboration, and governance — all driven from the data contract

Now What? Automate All the Things

A YAML file is only worth writing if something reads it. Once a contract is machine-readable, it becomes the single source that drives the rest:

Code generation: Java, Pydantic, dbt models, SQL DDL
Testing: compare the contract against real data, catch breaking changes in PRs, monitor continuously
Metadata distribution: push to metastores, catalogs (Collibra), and marketplaces (Entropy Data)
Infrastructure provisioning: output ports, input ports, anonymization, access control
Collaboration & governance: naming conventions, schema evolution, usage agreements, approval workflows

One file in, a whole toolchain out.

The Data Contract CLI: import from SQL DDL, JSON Schema, Iceberg, dbt, BigQuery and more, and export to SQL, HTML, dbt, Pydantic, and run tests against AWS S3, BigQuery, Azure, Databricks, Snowflake, and Kafka

GitHub star history chart for datacontract/cli and bitol-io/open-data-contract-standard, both climbing toward 1,000 stars by 2026

Open Source Tooling: the CLI

The workhorse is the open-source Data Contract CLI. It imports contracts from existing sources (SQL DDL, JSON Schema, Iceberg, dbt, BigQuery), exports to a dozen targets (SQL, HTML, dbt, Pydantic), and tests a contract against the real data in S3, BigQuery, Azure, Databricks, Snowflake, or Kafka.

The star-history curve tells the adoption story: both the CLI and the ODCS repo are climbing steeply toward 1,000 GitHub stars — the kind of de-facto traction that keeps a standard alive.

The open-source Data Contract Editor at editor.datacontract.com, editing an orders data contract with live preview and validation

The ODCS Excel template for capturing a data contract's schema in a spreadsheet

Four open-source tools for ODCS: the standard itself, the Data Contract Editor, the Data Contract CLI, and the Excel template

Write It However You Like

Nobody should hand-edit YAML if they don't want to. The open-source Data Contract Editor gives a standard-compliant authoring experience with live preview and validation.

And for the spreadsheet crowd, the ODCS Excel template — born of popular demand — captures the schema in Excel and converts to YAML later. The standard, the editor, the CLI, and the template: four free tools, one format.

Demo Time

"Author a contract, validate it, test it against real data — live."

Contracts describe an interface.

Standards for Data Products wire them together.

The Open Data Product Standard (ODPS): fundamentals, input ports, output ports, management ports, custom properties, with input and output ports referencing ODCS data contracts

An ODPS YAML example for an Orders data product, with input ports referencing data contracts and multiple versioned output ports (orders_pii, orders_npii)

An ODPS lineage graph: an Order Service application feeding Orders and Customers data products, which feed Funnel Analytics, a Monthly Target Performance Report, and Customer Cohorts

The Open Data Product Standard

If ODCS describes a single interface, ODPS — the Open Data Product Standard — describes the product that exposes them. It is the primary configuration and documentation for a data product, and it builds upon ODCS: its input and output ports reference data contracts by ID.

One ODPS YAML can declare several versioned output ports — say an orders_pii port and an orders_npii port — each backed by its own contract, plus the input ports it consumes.

Because ports point at contracts, you get lineage for free: applications feed data products, which feed other data products and consumers, in a graph you can actually trace. (See What is a Data Product?)

Upcoming standards in Bitol: ODCS 3.1 and 3.2, ODPS 1.0 and 1.1, OORS for observability results, and further standards for data domain (ODDS), data mesh (ODMS), access agreement (OAAS), and orchestration & control (OOCS)

What's Coming in Bitol

The Bitol family is growing. Beyond ODCS (3.1 now, 3.2 upcoming) and ODPS (1.0 now, 1.1 upcoming), there is OORS for observability results that feeds back into a contract's quality and SLA checks.

Further out sit drafts for data domains (ODDS), data mesh (ODMS), access agreements (OAAS), and orchestration & control (OOCS) — a coordinated set rather than 15 rival ones.

A warning about the name clash: 'ODPS' refers to both the Open Data Product Standard (part of the Bitol family, links to ODCS via ports) and a separate Open Data Product Specification (standalone, strong on pricing and i18n)

Mind the Name Clash

A real-world wrinkle, delivered with a grin: "ODPS" is two different things. There is the Open Data Product Standard (part of the Bitol family, links to ODCS via input and output ports, simple pricing, no i18n) and the Open Data Product Specification (standalone, limited output-port support, strong on pricing plans and i18n).

Both abbreviate to ODPS, both sit under LF AI & Data. So when you tell a coding agent to "use ODPS," be aware which one you mean — the messy reality of standards still being sorted out.

Standards for Semantics

"So the agent knows what the data actually means."

An OSI semantic model in YAML at opensemantic.com, defining datasets, relationships, metrics, and custom extensions

Open Semantic Interchange

The third pillar is meaning. Open Semantic Interchange (OSI, still v0.2.0.dev) defines a shared semantic model — datasets, relationships, and metrics — that both AI agents and BI tools (Tableau, Qlik, MicroStrategy) can consume from one place.

It is the layer that tells a tool what an "order" or an "average order value" actually is, rather than leaving every consumer to guess. Data contracts and data products point at it. (More: Semantics.)

A comparison table of Data Contract (ODCS) versus Semantic Model (OSI) across schema, relationships, metrics, dynamic fields, custom properties, terms of use, quality & SLAs, and AI context

Contract vs. Semantic Model

ODCS and OSI overlap, but they pull in different directions. Both cover schema, relationships, and custom properties. OSI adds first-class metrics, dynamic fields, and AI context; ODCS adds terms of use and quality & SLAs.

The two are complementary — the contract governs the exchange, the semantic model carries the meaning — and the gaps on each side (metrics for ODCS, governance for OSI) are exactly what the working groups are closing.

OSI working groups on advanced metrics & expression language, composability, catalog integration, ontology representation (already in 0.2.0.dev), and model converters & developer tools

Announcement that Open Semantic Interchange has been accepted into the Apache Incubator under the new name Apache Ossie

A Lot of Power Behind It

OSI has serious momentum — working groups on advanced metrics and an expression language, composability, catalog integration, ontology representation (already landing in 0.2.0.dev), and model converters.

The headline: OSI has been accepted into the Apache Incubator under a new name, Apache Ossie — moving to a vendor-neutral home so it can keep growing as the shared business-context layer that AI agents and BI tools both need.

A terminal opened in a dp_demo_new_snowflake_dp project, ready for a coding agent to build a data product on Snowflake from a contract

The Future Is Now

Back to the opening question — what if we already were using open standards for data products? — and the answer lands: we do. Simon drops into a fresh project, points a coding agent at an ODCS/ODPS spec, and lets it build a real data product on Snowflake.

The contract says what, ODPS wires the ports, OSI supplies the meaning, skills carry the how, and tools provide the access. Every box from the first slide is now a real, open standard — and the agent does the typing.

We peeked into the future. We learned the standards.

Now let's make history together.

A call to action: ODCS needs 1,000 GitHub stars to graduate from the Linux Foundation — scan the QR code and give it a star

Help ODCS Make History

Here is how the room can help write the next chapter: ODCS needs 1,000 GitHub stars to graduate within the Linux Foundation. Graduation is a signal of adoption — exactly the de-facto traction that keeps a de-jure standard alive.

If the standards in this talk are useful to you, give the ODCS repo a star. It costs a click and helps an open standard stay open.

Thank You

That is the case: data products built on a stack of open standards — ODCS for contracts, ODPS for products, OSI for semantics — let agents and humans build and consume data the same way. The future isn't coming; it's a git clone away.

Try the contract-based data product marketplace yourself at demo.entropy-data.com, reach Simon at simon.harrer@entropy-data.com or on LinkedIn, and give datacontract-cli a star on GitHub if it has been useful.

Open Standards for Data Products