YAML configuration¶
Complete reference for the dss-provisioner.yaml configuration file.
Minimal example¶
provider:
project: MY_PROJECT
datasets:
- name: raw_data
type: filesystem
connection: filesystem_managed
path: "${projectKey}/raw"
Full example¶
provider:
host: https://dss.company.com
# api_key: omit from YAML — set DSS_API_KEY env var instead
project: ANALYTICS
state_path: .dss-state.json
variables:
standard:
env: prod
data_root: /mnt/data
local:
debug: "false"
zones:
- name: raw
color: "#4a90d9"
- name: curated
color: "#7b61ff"
libraries:
- name: shared_utils
repository: git@github.com:org/dss-shared-lib.git
checkout: main
path: python
datasets:
- name: customers_raw
type: snowflake
connection: snowflake_prod
schema_name: RAW
table: CUSTOMERS
description: Raw customer data from Snowflake
- name: customers_clean
type: filesystem
connection: filesystem_managed
path: "${projectKey}/clean/customers"
managed: true
format_type: parquet
columns:
- name: id
type: int
- name: name
type: string
- name: email
type: string
tags:
- production
- pii
recipes:
- name: clean_customers
type: python
inputs: customers_raw
outputs: customers_clean
code_file: ./recipes/clean_customers.py
- name: sync_customers
type: sync
inputs: customers_clean
outputs: customers_synced
depends_on:
- dss_python_recipe.clean_customers
Provider¶
| Field | Env var | Required | Default | Description |
|---|---|---|---|---|
host |
DSS_HOST |
Yes | — | DSS instance URL |
api_key |
DSS_API_KEY |
Yes | — | API key |
project |
DSS_PROJECT |
Yes | — | Target project key |
Top-level fields¶
| Field | Type | Default | Description |
|---|---|---|---|
provider |
object | — | DSS connection settings (required) |
state_path |
string | .dss-state.json |
Path to state file |
variables |
object | — | Project variables (singleton, applied first) |
zones |
list | [] |
Flow zone definitions (provisioned before datasets/recipes) |
libraries |
list | [] |
Git library references (applied after variables, before datasets/recipes) |
datasets |
list | [] |
Dataset resource definitions |
recipes |
list | [] |
Recipe resource definitions |
Variables fields¶
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | variables |
Resource name (singleton — rarely overridden) |
standard |
dict | {} |
Standard project variables (shared across instances) |
local |
dict | {} |
Local project variables (instance-specific) |
description |
string | "" |
Not used by DSS variables (ignored) |
tags |
list | [] |
Not used by DSS variables (ignored) |
depends_on |
list | [] |
Explicit resource dependencies (addresses) |
Variables use partial semantics: only declared keys are managed. Extra keys already in DSS are preserved.
Variables are always applied before other resource types due to their plan_priority: 0 (other resources default to 100).
Zone fields¶
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | — | Required. Zone identifier (referenced by dataset/recipe zone field) |
color |
string | #2ab1ac |
Hex color in #RRGGBB format |
description |
string | "" |
Not used by DSS zones (ignored) |
tags |
list | [] |
Not used by DSS zones (ignored) |
depends_on |
list | [] |
Explicit resource dependencies (addresses) |
Note
Flow zones require DSS Enterprise. On Free Edition the zone API is unavailable.
Library fields¶
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | — | Required. Local target path in the library hierarchy |
repository |
string | — | Required. Git repository URL |
checkout |
string | main |
Branch, tag, or commit hash to check out |
path |
string | "" |
Subpath within the Git repository |
add_to_python_path |
bool | true |
Add to pythonPath in external-libraries.json |
description |
string | "" |
Not used by DSS libraries (ignored) |
tags |
list | [] |
Not used by DSS libraries (ignored) |
depends_on |
list | [] |
Explicit resource dependencies (addresses) |
Note
add_to_python_path is a create-time-only field. To change it, delete and recreate the library.
Dataset fields¶
Common fields (all types)¶
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | — | Required. Dataset name in DSS |
type |
string | — | Required. One of: snowflake, oracle, filesystem, upload |
connection |
string | — | DSS connection name |
managed |
bool | false |
Whether DSS manages the data lifecycle |
format_type |
string | — | Storage format (parquet, csv, etc.) |
format_params |
dict | {} |
Format-specific parameters |
columns |
list | [] |
Schema column definitions |
zone |
string | — | Flow zone (Enterprise only) |
description |
string | "" |
Dataset description (metadata) |
tags |
list | [] |
DSS tags |
depends_on |
list | [] |
Explicit resource dependencies (addresses) |
Snowflake-specific fields¶
| Field | Type | Default | Description |
|---|---|---|---|
connection |
string | — | Required. Snowflake connection name |
schema_name |
string | — | Required. Snowflake schema |
table |
string | — | Required. Table name |
catalog |
string | — | Snowflake database/catalog |
write_mode |
string | OVERWRITE |
OVERWRITE, APPEND, or TRUNCATE |
Oracle-specific fields¶
| Field | Type | Default | Description |
|---|---|---|---|
connection |
string | — | Required. Oracle connection name |
schema_name |
string | — | Required. Oracle schema |
table |
string | — | Required. Table name |
Filesystem-specific fields¶
| Field | Type | Default | Description |
|---|---|---|---|
connection |
string | — | Required. Filesystem connection name |
path |
string | — | Required. File path (supports ${projectKey}) |
Upload-specific fields¶
Upload datasets have no additional required fields. They default to managed: true.
Recipe fields¶
Common fields (all types)¶
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | — | Required. Recipe name in DSS |
type |
string | — | Required. One of: python, sql_query, sync |
inputs |
string or list | [] |
Input dataset name(s) |
outputs |
string or list | [] |
Output dataset name(s) |
zone |
string | — | Flow zone (Enterprise only) |
description |
string | "" |
Recipe description |
tags |
list | [] |
DSS tags |
depends_on |
list | [] |
Explicit resource dependencies (addresses) |
Note
inputs and outputs accept either a single string or a list of strings. A single string is automatically converted to a one-element list.
Python-specific fields¶
| Field | Type | Default | Description |
|---|---|---|---|
code |
string | "" |
Inline Python code |
code_file |
string | — | Path to Python file (relative to config file) |
code_env |
string | — | DSS code environment name |
code_wrapper |
bool | false |
Use DSS managed I/O wrapper |
SQL query-specific fields¶
| Field | Type | Default | Description |
|---|---|---|---|
code |
string | "" |
Inline SQL code |
code_file |
string | — | Path to SQL file (relative to config file) |
Sync-specific fields¶
Sync recipes have no additional fields beyond the common recipe fields.
Column definition¶
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | — | Required. Column name |
type |
string | — | Required. One of: string, int, bigint, float, double, boolean, date, array, object, map |
description |
string | "" |
Column description |
meaning |
string | — | DSS column meaning |
Dependencies¶
Resources can depend on each other in two ways:
Explicit dependencies¶
Use depends_on with full resource addresses:
Implicit dependencies¶
Recipe inputs and outputs automatically create dependencies on the referenced datasets. You don't need to add depends_on for these.