Skip to content

Deployment

This is the detailed deployment reference for the Dory Orchestrator. The orchestrator's own README covers the quick build/deploy; this page documents the full build pipeline, manifests, RBAC, database schema, secrets, and Karpenter integration.

See Configuration for the flags/env wired in here, HTTP API Reference and Metrics for the exposed surfaces, and Edge Failover for the failover behavior these manifests enable.

Prerequisites

Requirement Version / value
Go 1.25+
k8s.io/client-go v0.35
EKS 1.33+ (tested 1.35)
Karpenter v1.7+
System nodegroup role=system label, taint node-role=system:NoSchedule (hosts the orchestrator)
ECR registry <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com (region us-east-2)

Build

# Binary
go build -o orchestrator ./cmd/orchestrator

# Container image (amd64)
docker build --platform linux/amd64 -t dory-orchestrator:latest ./cmd/orchestrator

Run locally against a cluster:

export DORY_DATABASE_URL=postgres://...
export KUBECONFIG=~/.kube/config
go run ./cmd/orchestrator --namespace default --poll-interval 30s --log-level info

The pushed image is referenced as <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/dory-orchestrator:latest.

Applying manifests

The manifests live in deploy/. Apply RBAC, ConfigMap, Deployment, and Services.

Deployment

deploy/deployment.yaml — Deployment dory-orchestrator in namespace dory-system:

Setting Value
replicas 1
strategy Recreate
ServiceAccount dory-orchestrator
Pod securityContext runAsNonRoot, runAsUser 65534, fsGroup 65534
nodeSelector role: system
toleration node-role=system:NoSchedule
image <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/dory-orchestrator:latest
imagePullPolicy Always
Container securityContext allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsUser 65534, drop ALL
Port metrics 8080
Resources (requests) cpu 100m, mem 128Mi
Resources (limits) cpu 500m, mem 512Mi
Liveness GET /livez (initialDelay 30 / period 10 / timeout 5 / failure 3)
Readiness GET /readyz (10 / 5 / 3 / 3)
terminationGracePeriodSeconds 90

Args:

--config-db=$(DATABASE_URL) --namespace=$(NAMESPACE) --poll-interval=$(POLL_INTERVAL) --log-level=$(LOG_LEVEL) --enable-monitor

Env:

Env var Source
DATABASE_URL Secret dory-db-secret key database-url
NAMESPACE ConfigMap dory-orchestrator-config
POLL_INTERVAL ConfigMap dory-orchestrator-config
LOG_LEVEL ConfigMap dory-orchestrator-config
DORY_STATE_TOKEN Secret dory-state-secret key state-token (optional)

Services

deploy/service.yaml — two ClusterIP Services:

Service Port mapping Notes
dory-orchestrator-metrics 8080 → metrics Prometheus scrape annotations; edge pods reach the orchestrator here.
dory-orchestrator 80 → metrics

Edge pods address the orchestrator at dory-orchestrator-metrics.dory-system.svc.cluster.local:8080.

ConfigMap

deploy/configmap.yamldory-orchestrator-config:

Key Value
NAMESPACE default
POLL_INTERVAL 10s
LOG_LEVEL info

RBAC

deploy/rbac.yaml defines ServiceAccount dory-orchestrator (ns dory-system), a ClusterRoleBinding to ClusterRole dory-orchestrator:

Resource Verbs
namespaces get, list, watch
pods get, list, watch, create, delete, patch, update
pods/status, pods/log get
pods/eviction create
nodes get, list, watch, patch, update, delete
nodes/status get, patch, update
events create, patch
configmaps get, list, watch, create, update (drain sentinel)
secrets get, list, watch
karpenter.sh nodepools, nodeclaims get, list, watch
karpenter.k8s.aws ec2nodeclasses get, list, watch

A separate processor ServiceAccount dory-processor (ns default) plus a Role grants full configmaps verbs for SDK state persistence to dory-state-{processor_id}.

Secrets

Secret Key Use
dory-db-secret database-url PostgreSQL connection string.
dory-state-secret state-token Optional state transfer token (DORY_STATE_TOKEN).
ecr-registry-secret docker-registry Image pull secret (refreshed by CronJob — see below).

Database schema

PostgreSQL backs the orchestrator. Key tables:

Table Notes
processors One row → one pod. Columns: id UUID, processor_template_id FK→processor_templates.id, node_type, node_id FK→edge_nodes.id, k8s_namespace, k8s_pod_name, node_ip VARCHAR(45), sensor_id FK→sensors.id, status (pending/starting/running/terminated/failed/failover), health_status jsonb, last_health_check_at, consecutive_failures, terminated_at, created_at, updated_at.
processor_templates The application / "slug" table (formerly processing_applications; PK still named processing_applications_pkey). Columns: id, slug, name.
processor_template_versions processor_template_id FK, image_uri, digest, version, runtime_config_template JSON, build_config JSON, is_active bool.
sensors id, sensor_type, connection_config jsonb, metadata jsonb, location_point PostGIS.
edge_nodes id, organization_id, name, status (online/offline/decommissioned), failover_enabled, failover_target_node_id, last_heartbeat_at.
edge_node_apps id, edge_node_id, processor_config_id FK→processor_templates.id, failover_enabled, state_storage_path, status (active/failover/stopped), current_processor_id.
edge_node_events id, edge_node_id, processor_config_id (nullable), event_type, details jsonb, created_at.

Warning

The two edge_node_* tables key on column processor_config_id, while processors and processor_template_versions use processor_template_id. Both FK to processor_templates.id. Mixing them up causes column ... does not exist (SQLSTATE 42703) errors.

Karpenter

Manifests in deployments/karpenter/.

NodePool dory-app-pool (karpenter.sh/v1)

Field Value
Template labels workload-type=application, managed-by=dory-orchestrator
Taints none
Requirements arch amd64, os linux, capacity-type on-demand, instance-types [t3.small, t3.medium, t3a.small, t3a.medium]
nodeClassRef dory-app-nodeclass
Limits cpu 100, memory 400Gi
Disruption consolidationPolicy WhenEmpty, consolidateAfter 3m, budget 20%
Weight 10

EC2NodeClass dory-app-nodeclass (karpenter.k8s.aws/v1)

Field Value
amiFamily AL2023, alias al2023@latest
IAM role KarpenterNodeRole-dory-demo
Subnet & SG tag karpenter.sh/discovery: dory-demo
kubelet reservations configured
blockDeviceMappings /dev/xvda gp3 30Gi
IMDS IMDSv2 required
detailedMonitoring true

ECR token refresh

CronJob ecr-token-refresh (ns kube-system):

Field Value
Schedule 0 */6 * * * (ECR tokens expire after 12h)
Action aws ecr get-login-password (region us-east-2, registry <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com)
Result Recreates docker-registry secret ecr-registry-secret in namespaces default, kube-system, dory-system.

Warning

The orchestrator must run on a system node. System pods require nodeSelector: {role: system} and a toleration for the node-role=system:NoSchedule taint — the Deployment already sets both.