State Migration & Node Drain¶
The Dory Orchestrator preserves processor state across pod moves by capturing it over HTTP from the old pod and restoring it into a replacement. This page covers the state-transfer protocol, the node-drain migration path, the two migrator implementations, and consolidation.
State-transfer protocol¶
TransferManager (pkg/state/transfer.go) moves state between pods over HTTP.
- Port —
statePortdefault8080. - Auth — bearer token from env
DORY_STATE_TOKEN. If unset, requests are unauthenticated and a warning is logged. - Timeout —
DefaultHTTPTimeout = 30s. - Body limit —
MaxResponseBodySize = 10MB(enforced viaio.LimitReader).
Capture¶
CaptureState issues GET http://<oldPodIP>:8080/state. A 401 response logs "check DORY_STATE_TOKEN". The body is unmarshalled into ApplicationState:
| Fields |
|---|
PodName, AppName, CapturedAt, StateVersion, Data, Metrics, Connections, ActiveSessions, SessionData, Uptime, RequestCount, LastHealthTime |
Restore¶
TransferState issues POST http://<newPodIP>:8080/state with Content-Type: application/json and the bearer token.
Validate¶
ValidateState re-fetches /state from the new pod and checks AppName, StateVersion, and the SessionData count match.
Readiness and retries¶
WaitForPodReadypollsGET /healthevery 500ms for up to 15s before capture.- Retry helpers
CaptureWithRetry,RestoreWithRetry, andTransferWithRetryuse exponential backoff: base1s, ×2, capped at30s.
Node-drain migration¶
When the event monitor (--enable-monitor) sees a NoSchedule taint on an application node, it invokes the DrainManager asynchronously with a 5-minute context.
Warning
Without --enable-monitor, node drains are handled only by Karpenter/Kubernetes — this state-preserving migration path does not run.
HandleNodeDrain(ctx, nodeName):
- Cooldown —
DefaultDrainHandlingCooldown = 30sbetween drains for a node. - getPodsOnNode — pods labeled
managed-by=dory-orchestrator, fieldSelectorspec.nodeName, Running and not terminating. - getHealthyNodes — nodes labeled
workload-type=application, excluding draining nodes,Readywith noNoScheduletaint. - If zero healthy nodes →
needsKarpenter: create replacements with an emptytargetNodeso they go Pending and Karpenter provisions a node. - Per pod →
migratePodWithStateTransfer.
migratePodWithStateTransfer(oldPod, targetNode)¶
| Step | Action |
|---|---|
| 1. Capture | Capture state from the old pod IP before replacement (CaptureWithRetry, 30s ctx, 3 retries). Failure → fresh start; never fails the migration. |
| 2. Create | Create a replacement named {app}-drain-{unixts}. Pull processor config for port/resources/env, build PodSpecConfig, ensure the sentinel ConfigMap dory-controller-ref and attach its owner ref (Controller: true, BlockOwnerDeletion: false). targetNode == "" leaves NodeName empty for Karpenter. |
| 3. Wait | WaitForPodReady — 2m normally, 5m when waiting on Karpenter. |
| 4. Restore | Restore state (RestoreWithRetry); non-fatal. |
| 5. Finalize | markPodMigrated + DB updates (UpdateProcessorPodName with hostIP, status running, health). |
Note
The old pod is not deleted by the drain manager. The kubectl drain eviction removes it. The replacement is created first, so state is never lost during the move.
Tip
The sentinel ConfigMap owner ref with Controller: true, BlockOwnerDeletion: false makes kubectl drain treat each managed pod as controller-owned, so it evicts cleanly without --force — even though these are otherwise bare pods.
Drain constants: DefaultStateTransferTimeout=30s, DefaultPodReadyTimeout=2m, DefaultStateTransferRetries=3, DefaultDrainHandlingCooldown=30s, DefaultKarpenterPodReadyTimeout=5m, MigratedPodTTL=30m.
sequenceDiagram
participant Mon as Event Monitor
participant DM as DrainManager
participant Old as Old Pod
participant K8s as K8s API
participant Karp as Karpenter
participant New as Replacement Pod
Mon->>DM: NoSchedule taint on application node
DM->>DM: cooldown (30s) + list pods on node
DM->>DM: getHealthyNodes
alt no healthy node
DM->>DM: needsKarpenter (empty targetNode)
end
DM->>Old: GET /state (capture, 3 retries)
Old-->>DM: ApplicationState (or fresh start)
DM->>K8s: create {app}-drain-{ts} + sentinel ownerRef
opt targetNode == ""
K8s->>Karp: Pending pod triggers provisioning
Karp-->>K8s: new node
end
DM->>New: WaitForPodReady (2m / 5m Karpenter)
DM->>New: POST /state (restore, non-fatal)
DM->>K8s: markPodMigrated + DB update (hostIP, running)
Note over Old: kubectl drain evicts old pod
Migrator paths¶
The migrator (pkg/migrator) offers two implementations. Both are create-before-delete.
Default — Migrate() (no HTTP state transfer)¶
Relies on the SDK's own ConfigMap persistence rather than HTTP state transfer:
- Preserve image and labels; new pod name toggles a
-msuffix. - Create the new pod from DB config.
WaitForPodRunning(on failure: rollback and delete the new pod).- Update DB pod name and
nodeIP(pod.Status.HostIP). - Delete the old pod and wait for deletion.
Constants: DefaultMaxConcurrentMigrations=3, PodDeletionTimeout=50s, migrator HTTP client timeout 5s (health checks only). MigrateBatch runs at most 3 migrations concurrently.
Enhanced — MigrateWithValidation() (with state transfer)¶
Five (+1) phases:
- Create the new pod and wait until ready.
- Validate
/health. - Phase 2.5 —
transferStateviastate.TransferManager.TransferWithRetry(old → new, 3 retries). Failure is logged but non-fatal. - Gradual traffic shift (no real load balancer).
- Drain the old pod (10s).
- Delete the old pod.
Consolidation¶
Consolidation bin-packs running pods onto fewer nodes to let Karpenter reclaim emptied capacity. It is cooldown-gated (default 1m) and requires at least two workload-type=application nodes. scheduler.ConsolidatePods plans the moves and migrator.MigrateBatch executes them (≤3 concurrent). Emptied nodes are decommissioned by Karpenter, not the orchestrator. See Architecture for where consolidation sits in the reconcile cycle.