Resources¶

Resources are Pydantic models that describe the desired state of a DSS object. They are pure data — handlers know how to CRUD them.

Resource model¶

Every resource has:

Field	Description
`name`	Unique name within the DSS project
`description`	Optional description (stored as DSS metadata)
`tags`	Optional list of tags
`depends_on`	Explicit dependencies on other resources
`address`	Computed as `{resource_type}.{name}` (e.g., `dss_filesystem_dataset.raw_data`)

The address is the primary key used in state tracking and plan output.

Variables resource¶

The variables resource manages DSS project variables — a singleton key-value store per project. Variables are split into two scopes:

standard: shared across all instances
local: instance-specific overrides

Field	Type	Default	Description
`name`	`str`	`variables`	Resource name (singleton — rarely overridden)
`standard`	`dict[str, Any]`	`{}`	Standard project variables
`local`	`dict[str, Any]`	`{}`	Local project variables

variables:
  standard:
    env: prod
    data_root: /mnt/data
  local:
    debug: "false"

Variables use merge/partial semantics: only declared keys are managed. Extra keys in DSS are left alone.

Variables have plan_priority: 0, so they are always applied before other resources (zones, datasets, recipes all have priority 100). This ensures variables referenced via ${…} in other resources are set before those resources are created.

Zone resources¶

Zones partition a project's flow into logical sections (e.g. raw, curated, reporting). They are provisioned before datasets and recipes so that resources can reference them via the zone field.

Field	Type	Default	Description
`name`	`str`	—	Zone identifier (used by `zone` field on datasets/recipes)
`color`	`str`	`#2ab1ac`	Hex color displayed in the flow graph

zones:
  - name: raw
    color: "#4a90d9"
  - name: curated
    color: "#7b61ff"

Note

Flow zones require DSS Enterprise. On Free Edition the zone API returns 404: read and delete degrade gracefully (return None / no-op), while create and update raise a clear RuntimeError since the zone cannot actually be provisioned.

Git library resources¶

Git libraries import external Git repositories into a project's library, making shared code available to recipes. Each library entry maps to a Git reference in DSS's external-libraries.json.

Field	Type	Default	Description
`name`	`str`	—	Local target path in the library hierarchy (unique key)
`repository`	`str`	—	Git repository URL
`checkout`	`str`	`main`	Branch, tag, or commit hash
`path`	`str`	`""`	Subpath within the Git repository
`add_to_python_path`	`bool`	`true`	Add to `pythonPath` in `external-libraries.json`

libraries:
  - name: shared_utils
    repository: git@github.com:org/dss-shared-lib.git
    checkout: main
    path: python

Libraries have plan_priority: 10, so they are applied after variables (0) but before datasets and recipes (100). This ensures library code is available before recipes that import from it are created.

Note

add_to_python_path is a create-time-only field. Changing it after creation requires deleting and recreating the library. Credentials (SSH keys) are configured at the DSS instance level — no login/password fields are needed in the YAML config.

Dataset resources¶

All datasets share common fields from DatasetResource:

Field	Type	Default	Description
`connection`	`str`	—	DSS connection name
`managed`	`bool`	`false`	Whether DSS manages the data lifecycle
`format_type`	`str`	—	Storage format (e.g., `parquet`, `csv`)
`format_params`	`dict`	`{}`	Format-specific parameters
`columns`	`list[Column]`	`[]`	Schema columns
`zone`	`str`	—	Flow zone (Enterprise only)

Supported types¶

Type	YAML `type`	Extra required fields
Snowflake	`snowflake`	`connection`, `schema_name`, `table`
Oracle	`oracle`	`connection`, `schema_name`, `table`
Filesystem	`filesystem`	`connection`, `path`
Upload	`upload`	—

Snowflake datasets¶

datasets:
  - name: my_table
    type: snowflake
    connection: snowflake_prod
    schema_name: RAW
    table: CUSTOMERS
    catalog: MY_DB          # optional
    write_mode: OVERWRITE   # OVERWRITE (default), APPEND, or TRUNCATE

Filesystem datasets¶

datasets:
  - name: raw_data
    type: filesystem
    connection: filesystem_managed
    path: "${projectKey}/raw"
    format_type: parquet
    managed: true

The path field supports DSS variable substitution — ${projectKey} is resolved transparently during plan comparison.

Upload datasets¶

datasets:
  - name: lookup_table
    type: upload
    format_type: csv
    format_params:
      separator: ","
      charset: utf-8

Upload datasets are always managed (managed: true by default).

Recipe resources¶

All recipes share common fields from RecipeResource:

Field	Type	Default	Description
`inputs`	`str \\| list[str]`	`[]`	Input dataset name(s)
`outputs`	`str \\| list[str]`	`[]`	Output dataset name(s)
`zone`	`str`	—	Flow zone (Enterprise only)

Recipe inputs and outputs create implicit dependencies — the engine automatically orders recipes after their input datasets.

Supported types¶

Type	YAML `type`	Extra fields
Python	`python`	`code` or `code_file`, `code_env`, `code_wrapper`
SQL Query	`sql_query`	`code` or `code_file`
Sync	`sync`	—

Python recipes¶

recipes:
  - name: clean_customers
    type: python
    inputs: customers_raw
    outputs: customers_clean
    code_file: ./recipes/clean_customers.py  # loaded at plan time
    code_env: py311_pandas                   # optional code environment

You can provide code inline or via code_file. If both are set, code_file takes precedence. The code_wrapper flag controls whether the code runs in DSS's managed I/O wrapper.

SQL query recipes¶

recipes:
  - name: aggregate_orders
    type: sql_query
    inputs: orders_raw
    outputs: orders_summary
    code_file: ./recipes/aggregate_orders.sql

Sync recipes¶

recipes:
  - name: sync_customers
    type: sync
    inputs: customers_raw
    outputs: customers_synced

Columns¶

Define schema columns on datasets:

datasets:
  - name: customers
    type: filesystem
    connection: filesystem_managed
    path: "${projectKey}/customers"
    columns:
      - name: id
        type: int
        description: Customer ID
      - name: email
        type: string
      - name: score
        type: double
        meaning: customer_score  # optional DSS meaning

Supported column types: string, int, bigint, float, double, boolean, date, array, object, map.