Skip to content

Resources

Resources are Pydantic models that describe the desired state of a DSS object. They are pure data — handlers know how to CRUD them.

Resource model

Every resource has:

Field Description
name Unique name within the DSS project
description Optional description (stored as DSS metadata)
tags Optional list of tags
depends_on Explicit dependencies on other resources
address Computed as {resource_type}.{name} (e.g., dss_filesystem_dataset.raw_data)

The address is the primary key used in state tracking and plan output.

Variables resource

The variables resource manages DSS project variables — a singleton key-value store per project. Variables are split into two scopes:

  • standard: shared across all instances
  • local: instance-specific overrides
Field Type Default Description
name str variables Resource name (singleton — rarely overridden)
standard dict[str, Any] {} Standard project variables
local dict[str, Any] {} Local project variables
variables:
  standard:
    env: prod
    data_root: /mnt/data
  local:
    debug: "false"

Variables use merge/partial semantics: only declared keys are managed. Extra keys in DSS are left alone.

Variables have plan_priority: 0, so they are always applied before other resources (zones, datasets, recipes all have priority 100). This ensures variables referenced via ${…} in other resources are set before those resources are created.

Zone resources

Zones partition a project's flow into logical sections (e.g. raw, curated, reporting). They are provisioned before datasets and recipes so that resources can reference them via the zone field.

Field Type Default Description
name str Zone identifier (used by zone field on datasets/recipes)
color str #2ab1ac Hex color displayed in the flow graph
zones:
  - name: raw
    color: "#4a90d9"
  - name: curated
    color: "#7b61ff"

Note

Flow zones require DSS Enterprise. On Free Edition the zone API returns 404: read and delete degrade gracefully (return None / no-op), while create and update raise a clear RuntimeError since the zone cannot actually be provisioned.

Git library resources

Git libraries import external Git repositories into a project's library, making shared code available to recipes. Each library entry maps to a Git reference in DSS's external-libraries.json.

Field Type Default Description
name str Local target path in the library hierarchy (unique key)
repository str Git repository URL
checkout str main Branch, tag, or commit hash
path str "" Subpath within the Git repository
add_to_python_path bool true Add to pythonPath in external-libraries.json
libraries:
  - name: shared_utils
    repository: git@github.com:org/dss-shared-lib.git
    checkout: main
    path: python

Libraries have plan_priority: 10, so they are applied after variables (0) but before datasets and recipes (100). This ensures library code is available before recipes that import from it are created.

Note

add_to_python_path is a create-time-only field. Changing it after creation requires deleting and recreating the library. Credentials (SSH keys) are configured at the DSS instance level — no login/password fields are needed in the YAML config.

Dataset resources

All datasets share common fields from DatasetResource:

Field Type Default Description
connection str DSS connection name
managed bool false Whether DSS manages the data lifecycle
format_type str Storage format (e.g., parquet, csv)
format_params dict {} Format-specific parameters
columns list[Column] [] Schema columns
zone str Flow zone (Enterprise only)

Supported types

Type YAML type Extra required fields
Snowflake snowflake connection, schema_name, table
Oracle oracle connection, schema_name, table
Filesystem filesystem connection, path
Upload upload

Snowflake datasets

datasets:
  - name: my_table
    type: snowflake
    connection: snowflake_prod
    schema_name: RAW
    table: CUSTOMERS
    catalog: MY_DB          # optional
    write_mode: OVERWRITE   # OVERWRITE (default), APPEND, or TRUNCATE

Filesystem datasets

datasets:
  - name: raw_data
    type: filesystem
    connection: filesystem_managed
    path: "${projectKey}/raw"
    format_type: parquet
    managed: true

The path field supports DSS variable substitution — ${projectKey} is resolved transparently during plan comparison.

Upload datasets

datasets:
  - name: lookup_table
    type: upload
    format_type: csv
    format_params:
      separator: ","
      charset: utf-8

Upload datasets are always managed (managed: true by default).

Recipe resources

All recipes share common fields from RecipeResource:

Field Type Default Description
inputs str \| list[str] [] Input dataset name(s)
outputs str \| list[str] [] Output dataset name(s)
zone str Flow zone (Enterprise only)

Recipe inputs and outputs create implicit dependencies — the engine automatically orders recipes after their input datasets.

Supported types

Type YAML type Extra fields
Python python code or code_file, code_env, code_wrapper
SQL Query sql_query code or code_file
Sync sync

Python recipes

recipes:
  - name: clean_customers
    type: python
    inputs: customers_raw
    outputs: customers_clean
    code_file: ./recipes/clean_customers.py  # loaded at plan time
    code_env: py311_pandas                   # optional code environment

You can provide code inline or via code_file. If both are set, code_file takes precedence. The code_wrapper flag controls whether the code runs in DSS's managed I/O wrapper.

SQL query recipes

recipes:
  - name: aggregate_orders
    type: sql_query
    inputs: orders_raw
    outputs: orders_summary
    code_file: ./recipes/aggregate_orders.sql

Sync recipes

recipes:
  - name: sync_customers
    type: sync
    inputs: customers_raw
    outputs: customers_synced

Columns

Define schema columns on datasets:

datasets:
  - name: customers
    type: filesystem
    connection: filesystem_managed
    path: "${projectKey}/customers"
    columns:
      - name: id
        type: int
        description: Customer ID
      - name: email
        type: string
      - name: score
        type: double
        meaning: customer_score  # optional DSS meaning

Supported column types: string, int, bigint, float, double, boolean, date, array, object, map.