Skip to content

Dory Platform Overview

Dory is a platform for running stateful, fault-tolerant processors on Kubernetes. It provides zero-downtime migration, automatic state persistence, and edge↔cloud failover so that a long-running processor can survive node drains, spot reclamation, and edge outages without losing its in-memory state.

A processor is a continuously running workload: it reads input, maintains state across iterations, and publishes results. Dory's job is to keep that processor running on the right node, preserve its state across pod moves, and recover it when its node disappears.

The parts of the platform

Component Language Responsibility
Processor SDK Python Library used to build processors. A processor subclasses BaseProcessor, declares state, and implements an async run() loop. The SDK serves health, metrics, and state endpoints and handles automatic save/restore. See Processor SDK getting started.
Orchestrator Go Control plane. Reads desired processors from a PostgreSQL config DB and reconciles Kubernetes pods — creating, migrating, and failing over processor pods. See Orchestrator architecture.

Note

A Web Portal and a Subscriber SDK are additional platform components. They are documented elsewhere (see the Guides section) and are out of scope for this System section.

Who should read what

  • Operators — focus on the Orchestrator architecture and deployment material. The Orchestrator is the control plane you run and observe; it manages nodes, drains, failover, and the config DB.
  • Developers — focus on the Processor SDK getting started guide. You write processors against the SDK; the platform handles placement, state, and recovery.

Both audiences should read Core Concepts & Glossary for shared terminology.

How a processor runs

  1. A processor's desired configuration lives as a row in the PostgreSQL config DB (image, version, resources, environment).
  2. The Orchestrator reconciles that row into a Kubernetes pod, keyed by a processor-id label — one processor maps to exactly one pod.
  3. The pod runs your SDK-based processor. The SDK exposes health, metrics, and a state endpoint on port 8080 and publishes output to RabbitMQ.
  4. When the pod must move — node drain, spot reclamation, or edge failover — the Orchestrator captures state from the old pod, creates a replacement, restores state into it, and only then removes the old pod.
  5. The SDK auto-saves state on shutdown and restores it on startup, so the processor resumes where it left off.

Where to go next