System Architecture¶
This page maps the Dory platform end to end: the components, how the Orchestrator's control loop turns desired state into running pods, how state moves between pods, and how the platform recovers from node drains and edge outages. Deep detail lives in the SDK and Orchestrator sections — this page links to them.
Components & responsibilities¶
| Component | Role |
|---|---|
Orchestrator (Go, v0.1.0) |
Control plane. Single HTTP server on :8080. Reads desired processors from PostgreSQL and reconciles Kubernetes pods. |
| PostgreSQL config DB | Source of truth for desired state: processors joined to processor_templates and processor_template_versions (with runtime_config_template). |
| Kubernetes API | Where the Orchestrator creates, observes, and deletes processor pods. |
| Processor pods | SDK-based workloads, labeled managed-by=dory-orchestrator, running on managed (cloud) or edge nodes. |
| Karpenter | Provisions app nodes on demand when no existing node has room. NodePool dory-app-pool, EC2NodeClass dory-app-nodeclass. |
| RabbitMQ | Output bus. Processors publish a versioned envelope to the topic exchange dory.output. |
| Subscribers | Consume the envelope and match on its major version. Documented in the Subscriber SDK guide. |
The Orchestrator HTTP server exposes:
| Endpoint | Purpose |
|---|---|
GET /metrics |
Prometheus metrics |
GET /healthz, /readyz, /livez |
Orchestrator health/liveness/readiness |
POST /api/v1/edge/heartbeat |
Edge pod heartbeats (returns directive continue or shutdown) |
POST /api/v1/edge/nodes |
Register an edge node |
POST /api/v1/edge/nodes/decommission |
Decommission an edge node |
Architecture diagram¶
flowchart LR
DB[(PostgreSQL<br/>config DB)] --> ORCH[Orchestrator<br/>Go v0.1.0 :8080]
ORCH -->|reconcile pods| K8S[Kubernetes API]
ORCH -.->|provision_node| KARP[Karpenter<br/>dory-app-pool]
KARP -->|new app node| MN
subgraph Managed cloud nodes
MN[Managed pods<br/>nodeSelector workload-type=application]
end
subgraph Edge nodes
EN[Edge pods<br/>nodeSelector node-type=edge]
end
K8S --> MN
K8S --> EN
EN -->|POST /api/v1/edge/heartbeat| ORCH
ORCH <-->|GET/POST /state<br/>Bearer DORY_STATE_TOKEN| MN
ORCH <-->|GET/POST /state| EN
MN -->|envelope| MQ[(RabbitMQ<br/>exchange dory.output)]
EN -->|envelope| MQ
MQ --> SUB[Subscribers]
The control loop¶
The Orchestrator runs a config-watcher-driven reconcile loop.
- A config-watcher polls the PostgreSQL DB on a fixed interval (default 30s). Reconcile fires every interval even when nothing changed in the DB, so the scheduler can also act on consolidation opportunities.
- Desired state is keyed by the
processor-idlabel. One processor ⇒ one pod. - Pods are immutable: any change to a processor's config is applied as delete + recreate, not an in-place update.
- A DB row (
processors→processor_templates→processor_template_versions.runtime_config_template) maps to a pod spec: image{image_uri}@{digest}, env vars, resources, health probes (readinessGET /ready, livenessGET /health), prestop hook (GET /prestop), labels, and node placement.
Node placement rules:
| Pod type | Placement |
|---|---|
| Managed | nodeSelector{workload-type: application} |
| Edge | toleration edge-node=true:NoSchedule + nodeSelector{node-type: edge} |
| All pods | node affinity node-role NotIn [system]; ServiceAccount dory-processor; ImagePullSecret ecr-registry-secret |
The scheduler bin-packs with first-fit on the most-utilized healthy node (10% resource buffer). If nothing fits, it emits a provision_node decision that creates a Pending pod, which Karpenter then satisfies by provisioning an app node. See Orchestrator architecture for scheduler and reconciler internals.
State transfer¶
When a pod moves, the Orchestrator transfers its state directly between pods over HTTP.
sequenceDiagram
participant O as Orchestrator
participant Old as Old pod :8080
participant New as New pod :8080
O->>Old: GET /state (Bearer DORY_STATE_TOKEN)
Old-->>O: state body
O->>New: POST /state (Bearer DORY_STATE_TOKEN)
New-->>O: 200 OK
| Property | Value |
|---|---|
| Capture | GET http://<podIP>:8080/state |
| Restore | POST http://<podIP>:8080/state |
| Auth | Authorization: Bearer <DORY_STATE_TOKEN> |
| HTTP timeout | 30s |
| Max state size | 10 MB |
| Retries | exponential backoff (1s base ×2, cap 30s) |
Note
On the SDK side, capture and restore must finish within 25s and stay under 8 MB, keeping a buffer beneath the orchestrator's 30s / 10 MB limits. The SDK persists state to its configured backend (ConfigMap / S3 / PVC / Local). See Core Concepts & Glossary.
Zero-downtime migration on node drain¶
When the Orchestrator runs with --enable-monitor and an application node is cordoned:
- The Orchestrator captures state from each pod on the draining node.
- It creates a replacement pod named
<app>-drain-<ts>— on a healthy node, or with an emptyNodeNameso Karpenter provisions one (with an extended 5m readiness wait). - It restores state into the replacement.
- Only then does it let
kubectl drainevict the old pod.
Tip
A sentinel ConfigMap dory-controller-ref is attached to processor pods as an owner reference, making bare pods drain-eligible without --force.
Edge ↔ cloud failover¶
Edge pods POST heartbeats to POST /api/v1/edge/heartbeat; the response carries a directive of continue or shutdown.
The Orchestrator marks an edge node failed when either:
- The node is Kubernetes
NotReadyfor more than a 30s grace period, or - Its DB heartbeat is stale for more than 60s.
On failover, the app is recreated as a managed (cloud) pod with:
| Label / env | Value |
|---|---|
workload-location |
edge |
migrated-from-edge |
true |
original-edge-node |
<node> |
env DORY_MIGRATED_FROM_EDGE |
true |
env DORY_STATE_RESTORE_PATH |
<state_storage_path> |
The SDK restores state from its ConfigMap-backed store. Failback recreates the edge pod once the edge node returns.
flowchart LR
EP[Edge pod] -- heartbeat --> ORCH[Orchestrator]
ORCH -- NotReady >30s OR<br/>heartbeat stale >60s --> FAIL{Edge node failed?}
FAIL -- yes --> MP[Managed pod<br/>migrated-from-edge=true]
MP -- edge node returns --> FB[Failback recreates edge pod]
See Orchestrator architecture for the failover/failback state machine and fencing details.
The output path¶
Processors publish results to RabbitMQ.
- Exchange: topic exchange
dory.output. - Envelope (JSON):
{ schema_version: "0.1", message_id: <uuid4>, timestamp: <ISO8601 UTC>, payload: { ... } }. - Routing key:
<processor_id>.<event_type>.<geohash-segments>.
{
"schema_version": "0.1",
"message_id": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
"timestamp": "2026-06-17T12:00:00Z",
"payload": {}
}
Subscribers bind to dory.output with routing-key patterns and match on the envelope's major version. The Processor SDK constructs and publishes the envelope; see Processor SDK getting started.