ποΈ Business Workflow β Config in S3 with Local Fallback
A lightweight, non-technical playbook for keeping configs auditable, versioned, and tied to specific code releases, with automatic fallback to packaged defaults.
π How Config Loads at Runtime
flowchart LR
A[CLI Flag or .env] -->|CONFIG_S3_URI set| B[Load from S3 via boto3]
B --> C[Log sha256 + ETag + VersionId]
A -->|No S3 config| D[Use packaged config.toml]
C --> E[Config available to Spark job]
D --> E
π Store Configs in a Stable Location
Use a consistent key so jobs always know where to find the latest config:
s3://my-biz-configs/<env>/<pipeline>/config.toml
If you generate date-stamped configs, also upload the same file
to .../config.toml as a stable alias.
π Version by Release
On each deploy, copy the config to a release folder and pass that exact object to the job:
s3://my-biz-configs/<env>/<pipeline>/releases/<deployment_id>/config.toml
Then set --config-s3 (or CONFIG_S3_URI) to that URI -- this couples config
to the specific code build.
π Example config.toml
[general]
pipeline_name = "dom-pipeline"
input_path = "s3://my-raw-bucket/data/"
output_path = "s3://my-processed-bucket/results/"
[processing]
max_partitions = 200
enable_deduplication = true
π Production Checklist
- β Config uploaded to correct path & accessible by EMR role.
- β Bucket versioning enabled.
- β
sha256andETaglogged on deploy. - β CHANGELOG updated with what changed and why.
- β Validation schema matches config file.
β οΈ Common Pitfalls
- Wrong prefix β
s3://included twice in path will cause a404. - IAM perms missing β Ensure
s3:GetObjectands3:GetObjectVersionare allowed for the EMR role. - Cached packaged config β If S3 config isnβt being picked up,
check
--config-s3or.envis being read in EMR.
π₯ Running with Config from S3
Option 1 β CLI flag (highest priority)
uv run deploy-to-emr --config-s3 s3://my-biz-configs/prod/dom/config.toml
Option 2 β .env file
# .env
CONFIG_S3_URI=s3://my-biz-configs/prod/dom/config.toml
Then run:
uv run deploy-to-emr
βοΈ Runtime Behaviour
- If
--config-s3orCONFIG_S3_URIis set β Fetch from S3, parse TOML in-memory, logsha256/ETag/VersionId. - Otherwise β Use packaged
config.tomlfromemr_dummymodule. - Config is broadcast to executors as needed.