Knowledge

What is a Data Contract?

Data contract diagram showing data provider, data contract, and data consumer

A data contract is a document that defines the ownership, structure, semantics, quality, and terms of use for exchanging data between a data producer and their consumers. Think of an API, but for data.

Why Data Contracts?

Organizations struggle with good metadata and fragile data pipelines that break from upstream changes, poor communication between data producers and consumers, and data engineers overwhelmed by competing demands. Data contracts solve these problems by establishing explicit expectations on data across these dimensions:

  • Ownership – Responsibility for providing correct data
  • Schema – Column names, data types, structure
  • Semantics – Descriptions and business meaning
  • Quality – Validation rules, freshness, completeness
  • Terms of Use – Usage rights, SLAs, access policies

Data contracts are a communication tool to express a common understanding of how data should be structured and interpreted. They can be created collaboratively with the data provider and consumer, even before the data product is implemented, a contract-first approach. In development and production, they serve as the basis for code generation, testing, schema validations, quality checks, monitoring, and computational governance to ensure that data products match the agreed-upon expectations.

Open Data Contract Standard (ODCS)

The Open Data Contract Standard (ODCS) is the open standard for defining data contracts in a machine-readable YAML format. Originally developed as the Data Contract Template at PayPal, it is now governed by Bitol, a Linux Foundation AI & Data project.

Here's a simplified example of an ODCS data contract:

apiVersion: v3.1.0
kind: DataContract
id: orders
name: Orders
version: 1.0.0
status: active
description:
  purpose: "Provides order and line item data for analytics and reporting"
  usage: "Used by analytics team for sales analysis and business intelligence"
  limitations: "Contains only the last 2 years of data"
  customProperties:
    - property: "sensitivity"
      value: "secret"
      description: "Data contains personally identifiable information"
  authoritativeDefinitions:
    - url: "https://entropy-data.com/policies/gdpr-compliance"
      type: "businessDefinition"
      description: "GDPR compliance policy for handling customer data"
schema:
  - name: orders
    physicalType: TABLE
    description: All historic web shop orders since 2020-01-01. Includes successful and cancelled orders.
    properties:
      - name: order_id
        logicalType: string
        description: The internal order id for every orders. Do not show this to a customer.
        businessName: Internal Order ID
        physicalType: UUID
        examples:
          - 99e8bb10-3785-4634-9664-8dc79eb69d43
        primaryKey: true
        classification: internal
        required: true
        unique: true
      - name: customer_id
        logicalType: string
        description: A reference to the customer number
        businessName: Customer Number
        physicalType: TEXT
        examples:
          - c123456789
        required: true
        unique: false
        logicalTypeOptions:
          minLength: 10
          maxLength: 10
        authoritativeDefinitions:
          - type: definition
            url: https://example.com/definitions/sales/customer/customer_id
        tags:
          - pii:true
        classification: internal
        criticalDataElement: true
      - name: order_total
        logicalType: integer
        description: The order total amount in cents, including tax, after discounts.
          Includes shipping costs.
        physicalType: INTEGER
        examples:
          - "9999"
        quality:
          - type: text
            description: The order_total equals the sum of all related line items.
        required: true
        businessName: Order Amount
      - name: order_timestamp
        logicalType: timestamp
        description: The time including timezone when the order payment was successfully
          confirmed.
        physicalType: TIMESTAMPTZ
        businessName: Order Date
        examples:
          - "2025-03-01 14:30:00+01"
      - name: order_status
        businessName: Status
        description: The business status of the order
        logicalType: string
        physicalType: TEXT
        examples:
          - shipped
        quality:
          - type: library
            description: Ensure that there are no other status values.
            metric: invalidValues
            arguments:
              validValues:
                - pending
                - paid
                - processing
                - shipped
                - delivered
                - cancelled
                - refunded
            mustBe: 0
    quality:
      - type: library
        metric: rowCount
        mustBeGreaterThan: 100000
        description: If there are less than 100k rows, something is wrong.
  - name: line_items
    physicalType: table
    description: Details for each item in an order
    properties:
      - name: line_item_id
        logicalType: string
        description: Unique identifier for the line item
        physicalType: UUID
        examples:
          - 12c9ba21-0c44-4e29-ba72-b8fd01c1be30
        logicalTypeOptions:
          format: uuid
        required: true
        primaryKey: true
      - name: sku
        logicalType: string
        businessName: Stock Keeping Unit
        description: Identifier for the purchased product
        physicalType: TEXT
        examples:
          - 111222333
        required: true
      - name: price
        logicalType: integer
        description: Price in cents for this line item including tax
        physicalType: INTEGER
        examples:
          - 9999
        required: true
      - name: order_id
        required: false
        primaryKey: false
        logicalType: string
        physicalType: UUID
        relationships:
          - type: foreignKey
            to: orders.order_id
servers:
  - server: production
    environment: prod
    type: postgres
    host: aws-1-eu-central-2.pooler.supabase.com
    port: 6543
    database: postgres
    schema: dp_orders_v1
team:
  name: sales
  description: This data product is owned by the "Sales" team
  members:
    - username: john@example.com
      name: John Doe
      role: Owner
  authoritativeDefinitions:
    - type: slack
      url: https://slack.example.com/teams/sales
roles:
  - role: analyst_us
    description: Read access for analytics to US orders
  - role: analyst_eu
    description: Read access for analytics to EU orders
slaProperties:
  - property: availability
    value: 99.9%
    description: Data platform uptime guarantee
  - property: retention
    value: "1"
    unit: year
    description: Data will be deleted after 1 year
  - property: freshness
    value: "24"
    unit: hours
#    element: orders.order_timestamp # enable this to check freshness with Data Contract CLI
    description: Within 24 hours of order placement
  - property: support
    value: business hours
    description: Support only during business hours
price:
  priceAmount: 0
  priceCurrency: USD
  priceUnit: monthly
tags:
  - e-commerce
  - transactions
  - pii
customProperties:
  - property: dataPlatformRole
    value: role_orders_v1
contractCreatedTs: "2025-01-15T10:00:00Z"

Learn more about the full specification, examples, and tooling at datacontract.com.

Tooling

The Data Contract CLI is an open-source command-line tool for working with data contracts. It can lint and validate contracts, connect to data sources to execute schema and quality tests, detect breaking changes in CI/CD pipelines, and export to various formats.

The Data Contract Editor by Entropy Data is a browser-based editor for authoring data contracts with live preview and validation.

Manage Data Contracts with Entropy Data

Entropy Data provides a web-based platform to manage data products, data contracts, and data usage agreements as a self-service. An event-based API enables seamless integration with any data platform, and every change is recorded in an audit trail.

Screenshot of a data contract in Entropy Data

Features include a data product catalog, request and approval workflows for data usage agreements, automated permission management, and visualization of your data mesh as an interactive map.

Sign up for free, or explore the interactive demo.