Skip to content

YAML configuration

Complete reference for the dss-provisioner.yaml configuration file.

Minimal example

provider:
  project: MY_PROJECT

datasets:
  - name: raw_data
    type: filesystem
    connection: filesystem_managed
    path: "${projectKey}/raw"

Full example

provider:
  host: https://dss.company.com
  # api_key: omit from YAML — set DSS_API_KEY env var instead
  project: ANALYTICS

state_path: .dss-state.json

variables:
  standard:
    env: prod
    data_root: /mnt/data
  local:
    debug: "false"

zones:
  - name: raw
    color: "#4a90d9"
  - name: curated
    color: "#7b61ff"

libraries:
  - name: shared_utils
    repository: git@github.com:org/dss-shared-lib.git
    checkout: main
    path: python

datasets:
  - name: customers_raw
    type: snowflake
    connection: snowflake_prod
    schema_name: RAW
    table: CUSTOMERS
    description: Raw customer data from Snowflake

  - name: customers_clean
    type: filesystem
    connection: filesystem_managed
    path: "${projectKey}/clean/customers"
    managed: true
    format_type: parquet
    columns:
      - name: id
        type: int
      - name: name
        type: string
      - name: email
        type: string
    tags:
      - production
      - pii

recipes:
  - name: clean_customers
    type: python
    inputs: customers_raw
    outputs: customers_clean
    code_file: ./recipes/clean_customers.py

  - name: sync_customers
    type: sync
    inputs: customers_clean
    outputs: customers_synced
    depends_on:
      - dss_python_recipe.clean_customers

Provider

Field Env var Required Default Description
host DSS_HOST Yes DSS instance URL
api_key DSS_API_KEY Yes API key
project DSS_PROJECT Yes Target project key

Top-level fields

Field Type Default Description
provider object DSS connection settings (required)
state_path string .dss-state.json Path to state file
variables object Project variables (singleton, applied first)
zones list [] Flow zone definitions (provisioned before datasets/recipes)
libraries list [] Git library references (applied after variables, before datasets/recipes)
datasets list [] Dataset resource definitions
recipes list [] Recipe resource definitions

Variables fields

Field Type Default Description
name string variables Resource name (singleton — rarely overridden)
standard dict {} Standard project variables (shared across instances)
local dict {} Local project variables (instance-specific)
description string "" Not used by DSS variables (ignored)
tags list [] Not used by DSS variables (ignored)
depends_on list [] Explicit resource dependencies (addresses)

Variables use partial semantics: only declared keys are managed. Extra keys already in DSS are preserved.

Variables are always applied before other resource types due to their plan_priority: 0 (other resources default to 100).

Zone fields

Field Type Default Description
name string Required. Zone identifier (referenced by dataset/recipe zone field)
color string #2ab1ac Hex color in #RRGGBB format
description string "" Not used by DSS zones (ignored)
tags list [] Not used by DSS zones (ignored)
depends_on list [] Explicit resource dependencies (addresses)

Note

Flow zones require DSS Enterprise. On Free Edition the zone API is unavailable.

Library fields

Field Type Default Description
name string Required. Local target path in the library hierarchy
repository string Required. Git repository URL
checkout string main Branch, tag, or commit hash to check out
path string "" Subpath within the Git repository
add_to_python_path bool true Add to pythonPath in external-libraries.json
description string "" Not used by DSS libraries (ignored)
tags list [] Not used by DSS libraries (ignored)
depends_on list [] Explicit resource dependencies (addresses)

Note

add_to_python_path is a create-time-only field. To change it, delete and recreate the library.

Dataset fields

Common fields (all types)

Field Type Default Description
name string Required. Dataset name in DSS
type string Required. One of: snowflake, oracle, filesystem, upload
connection string DSS connection name
managed bool false Whether DSS manages the data lifecycle
format_type string Storage format (parquet, csv, etc.)
format_params dict {} Format-specific parameters
columns list [] Schema column definitions
zone string Flow zone (Enterprise only)
description string "" Dataset description (metadata)
tags list [] DSS tags
depends_on list [] Explicit resource dependencies (addresses)

Snowflake-specific fields

Field Type Default Description
connection string Required. Snowflake connection name
schema_name string Required. Snowflake schema
table string Required. Table name
catalog string Snowflake database/catalog
write_mode string OVERWRITE OVERWRITE, APPEND, or TRUNCATE

Oracle-specific fields

Field Type Default Description
connection string Required. Oracle connection name
schema_name string Required. Oracle schema
table string Required. Table name

Filesystem-specific fields

Field Type Default Description
connection string Required. Filesystem connection name
path string Required. File path (supports ${projectKey})

Upload-specific fields

Upload datasets have no additional required fields. They default to managed: true.

Recipe fields

Common fields (all types)

Field Type Default Description
name string Required. Recipe name in DSS
type string Required. One of: python, sql_query, sync
inputs string or list [] Input dataset name(s)
outputs string or list [] Output dataset name(s)
zone string Flow zone (Enterprise only)
description string "" Recipe description
tags list [] DSS tags
depends_on list [] Explicit resource dependencies (addresses)

Note

inputs and outputs accept either a single string or a list of strings. A single string is automatically converted to a one-element list.

Python-specific fields

Field Type Default Description
code string "" Inline Python code
code_file string Path to Python file (relative to config file)
code_env string DSS code environment name
code_wrapper bool false Use DSS managed I/O wrapper

SQL query-specific fields

Field Type Default Description
code string "" Inline SQL code
code_file string Path to SQL file (relative to config file)

Sync-specific fields

Sync recipes have no additional fields beyond the common recipe fields.

Column definition

Field Type Default Description
name string Required. Column name
type string Required. One of: string, int, bigint, float, double, boolean, date, array, object, map
description string "" Column description
meaning string DSS column meaning

Dependencies

Resources can depend on each other in two ways:

Explicit dependencies

Use depends_on with full resource addresses:

recipes:
  - name: aggregate
    type: python
    depends_on:
      - dss_python_recipe.clean_data

Implicit dependencies

Recipe inputs and outputs automatically create dependencies on the referenced datasets. You don't need to add depends_on for these.

datasets:
  - name: raw_data
    type: filesystem
    connection: filesystem_managed
    path: "${projectKey}/raw"

recipes:
  - name: process
    type: python
    inputs: raw_data      # automatically depends on dss_filesystem_dataset.raw_data
    outputs: clean_data