Deployment¶
This is the detailed deployment reference for the Dory Orchestrator. The orchestrator's own README covers the quick build/deploy; this page documents the full build pipeline, manifests, RBAC, database schema, secrets, and Karpenter integration.
See Configuration for the flags/env wired in here, HTTP API Reference and Metrics for the exposed surfaces, and Edge Failover for the failover behavior these manifests enable.
Prerequisites¶
| Requirement | Version / value |
|---|---|
| Go | 1.25+ |
k8s.io/client-go |
v0.35 |
| EKS | 1.33+ (tested 1.35) |
| Karpenter | v1.7+ |
| System nodegroup | role=system label, taint node-role=system:NoSchedule (hosts the orchestrator) |
| ECR registry | <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com (region us-east-2) |
Build¶
# Binary
go build -o orchestrator ./cmd/orchestrator
# Container image (amd64)
docker build --platform linux/amd64 -t dory-orchestrator:latest ./cmd/orchestrator
Run locally against a cluster:
export DORY_DATABASE_URL=postgres://...
export KUBECONFIG=~/.kube/config
go run ./cmd/orchestrator --namespace default --poll-interval 30s --log-level info
The pushed image is referenced as <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/dory-orchestrator:latest.
Applying manifests¶
The manifests live in deploy/. Apply RBAC, ConfigMap, Deployment, and Services.
Deployment¶
deploy/deployment.yaml — Deployment dory-orchestrator in namespace dory-system:
| Setting | Value |
|---|---|
replicas |
1 |
strategy |
Recreate |
| ServiceAccount | dory-orchestrator |
| Pod securityContext | runAsNonRoot, runAsUser 65534, fsGroup 65534 |
| nodeSelector | role: system |
| toleration | node-role=system:NoSchedule |
| image | <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com/dory-orchestrator:latest |
| imagePullPolicy | Always |
| Container securityContext | allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsUser 65534, drop ALL |
| Port | metrics 8080 |
| Resources (requests) | cpu 100m, mem 128Mi |
| Resources (limits) | cpu 500m, mem 512Mi |
| Liveness | GET /livez (initialDelay 30 / period 10 / timeout 5 / failure 3) |
| Readiness | GET /readyz (10 / 5 / 3 / 3) |
terminationGracePeriodSeconds |
90 |
Args:
--config-db=$(DATABASE_URL) --namespace=$(NAMESPACE) --poll-interval=$(POLL_INTERVAL) --log-level=$(LOG_LEVEL) --enable-monitor
Env:
| Env var | Source |
|---|---|
DATABASE_URL |
Secret dory-db-secret key database-url |
NAMESPACE |
ConfigMap dory-orchestrator-config |
POLL_INTERVAL |
ConfigMap dory-orchestrator-config |
LOG_LEVEL |
ConfigMap dory-orchestrator-config |
DORY_STATE_TOKEN |
Secret dory-state-secret key state-token (optional) |
Services¶
deploy/service.yaml — two ClusterIP Services:
| Service | Port mapping | Notes |
|---|---|---|
dory-orchestrator-metrics |
8080 → metrics |
Prometheus scrape annotations; edge pods reach the orchestrator here. |
dory-orchestrator |
80 → metrics |
Edge pods address the orchestrator at dory-orchestrator-metrics.dory-system.svc.cluster.local:8080.
ConfigMap¶
deploy/configmap.yaml — dory-orchestrator-config:
| Key | Value |
|---|---|
NAMESPACE |
default |
POLL_INTERVAL |
10s |
LOG_LEVEL |
info |
RBAC¶
deploy/rbac.yaml defines ServiceAccount dory-orchestrator (ns dory-system), a ClusterRoleBinding to ClusterRole dory-orchestrator:
| Resource | Verbs |
|---|---|
namespaces |
get, list, watch |
pods |
get, list, watch, create, delete, patch, update |
pods/status, pods/log |
get |
pods/eviction |
create |
nodes |
get, list, watch, patch, update, delete |
nodes/status |
get, patch, update |
events |
create, patch |
configmaps |
get, list, watch, create, update (drain sentinel) |
secrets |
get, list, watch |
karpenter.sh nodepools, nodeclaims |
get, list, watch |
karpenter.k8s.aws ec2nodeclasses |
get, list, watch |
A separate processor ServiceAccount dory-processor (ns default) plus a Role grants full configmaps verbs for SDK state persistence to dory-state-{processor_id}.
Secrets¶
| Secret | Key | Use |
|---|---|---|
dory-db-secret |
database-url |
PostgreSQL connection string. |
dory-state-secret |
state-token |
Optional state transfer token (DORY_STATE_TOKEN). |
ecr-registry-secret |
docker-registry | Image pull secret (refreshed by CronJob — see below). |
Database schema¶
PostgreSQL backs the orchestrator. Key tables:
| Table | Notes |
|---|---|
processors |
One row → one pod. Columns: id UUID, processor_template_id FK→processor_templates.id, node_type, node_id FK→edge_nodes.id, k8s_namespace, k8s_pod_name, node_ip VARCHAR(45), sensor_id FK→sensors.id, status (pending/starting/running/terminated/failed/failover), health_status jsonb, last_health_check_at, consecutive_failures, terminated_at, created_at, updated_at. |
processor_templates |
The application / "slug" table (formerly processing_applications; PK still named processing_applications_pkey). Columns: id, slug, name. |
processor_template_versions |
processor_template_id FK, image_uri, digest, version, runtime_config_template JSON, build_config JSON, is_active bool. |
sensors |
id, sensor_type, connection_config jsonb, metadata jsonb, location_point PostGIS. |
edge_nodes |
id, organization_id, name, status (online/offline/decommissioned), failover_enabled, failover_target_node_id, last_heartbeat_at. |
edge_node_apps |
id, edge_node_id, processor_config_id FK→processor_templates.id, failover_enabled, state_storage_path, status (active/failover/stopped), current_processor_id. |
edge_node_events |
id, edge_node_id, processor_config_id (nullable), event_type, details jsonb, created_at. |
Warning
The two edge_node_* tables key on column processor_config_id, while processors and processor_template_versions use processor_template_id. Both FK to processor_templates.id. Mixing them up causes column ... does not exist (SQLSTATE 42703) errors.
Karpenter¶
Manifests in deployments/karpenter/.
NodePool dory-app-pool (karpenter.sh/v1)¶
| Field | Value |
|---|---|
| Template labels | workload-type=application, managed-by=dory-orchestrator |
| Taints | none |
| Requirements | arch amd64, os linux, capacity-type on-demand, instance-types [t3.small, t3.medium, t3a.small, t3a.medium] |
nodeClassRef |
dory-app-nodeclass |
| Limits | cpu 100, memory 400Gi |
| Disruption | consolidationPolicy WhenEmpty, consolidateAfter 3m, budget 20% |
| Weight | 10 |
EC2NodeClass dory-app-nodeclass (karpenter.k8s.aws/v1)¶
| Field | Value |
|---|---|
| amiFamily | AL2023, alias al2023@latest |
| IAM role | KarpenterNodeRole-dory-demo |
| Subnet & SG | tag karpenter.sh/discovery: dory-demo |
| kubelet | reservations configured |
| blockDeviceMappings | /dev/xvda gp3 30Gi |
| IMDS | IMDSv2 required |
| detailedMonitoring | true |
ECR token refresh¶
CronJob ecr-token-refresh (ns kube-system):
| Field | Value |
|---|---|
| Schedule | 0 */6 * * * (ECR tokens expire after 12h) |
| Action | aws ecr get-login-password (region us-east-2, registry <ACCOUNT_ID>.dkr.ecr.us-east-2.amazonaws.com) |
| Result | Recreates docker-registry secret ecr-registry-secret in namespaces default, kube-system, dory-system. |
Warning
The orchestrator must run on a system node. System pods require nodeSelector: {role: system} and a toleration for the node-role=system:NoSchedule taint — the Deployment already sets both.