HTTP API Reference¶

The Dory Orchestrator exposes a single HTTP server on port 8080 serving the edge API, health/readiness probes, and Prometheus metrics. Edge pods reach it in-cluster at dory-orchestrator-metrics.dory-system.svc.cluster.local:8080.

See Configuration for MetricsPort, Metrics for the /metrics payload, Edge Failover for how the heartbeat drives health detection, and Deployment for the Services that front this port.

Edge API¶

`POST /api/v1/edge/heartbeat`¶

Edge processors call this periodically to report liveness and receive a directive.

Non-POST requests → 405 Method Not Allowed.
Request body limit: 64 KB.
Invalid JSON → 400 Bad Request.

Request body:

{
  "processor_id": "string",
  "node_id": "string",
  "role": "string",
  "epoch": 0,
  "timestamp": 0.0,
  "status": "string",
  "last_state_sync": 0.0,
  "state_size_bytes": 0,
  "uptime_sec": 0.0
}

epoch, last_state_sync, and state_size_bytes are pointers (nullable / optional).

Behavior:

If processor_id is set:
- Update edge_nodes.last_heartbeat_at = NOW() for the node linked to that processor.
- Update processors.last_health_check_at and processors.health_status (healthy when status == "healthy").
Directive: defaults to "continue". If processor_id is set and GetProcessorConfigByID fails — i.e. the processor is not found or its status is terminated/failed — the directive is "shutdown".

Note

Only continue and shutdown are ever returned. There is no demote or promote directive.

Response — always 200, application/json:

{
  "acknowledged": true,
  "orchestrator_time": 0.0,
  "directive": "continue"
}

orchestrator_time is Unix seconds (float). directive is "continue" or "shutdown".

`POST /api/v1/edge/nodes`¶

Register an edge node.

Request body:

{
  "name": "string",
  "organization_id": "uuid",
  "failover_enabled": true
}

name is required → 400 if missing.
organization_id must be a valid UUID → 400 otherwise.

Calls RegisterEdgeNode (INSERT into edge_nodes with status='online').

Response — 201 Created:

{
  "id": "uuid",
  "organization_id": "uuid",
  "name": "string",
  "status": "online",
  "failover_enabled": true,
  "failover_target_node_id": null,
  "last_heartbeat_at": null
}

`POST /api/v1/edge/nodes/decommission`¶

Decommission an edge node and stop its active apps.

Request body:

{ "node_name": "string" }

node_name is required → 400 if missing.

Sets edge_node_apps.status='stopped' for active apps and edge_nodes.status='decommissioned'.

Node not found → 500.

Response — 200:

{ "status": "decommissioned", "node_name": "<name>" }

Observability & health¶

`GET /metrics`¶

Prometheus exposition via promhttp. See Metrics.

`GET /healthz`¶

Runs all health checkers. Overall status is unhealthy if any checker is unhealthy, degraded if any is degraded, otherwise healthy.

healthy or degraded → 200.
unhealthy → 503.

Checkers: database (Ping), kubernetes (GetServerVersion), watcher (GetHealth), health-poller.

Response body:

{
  "status": "healthy",
  "timestamp": "...",
  "version": "...",
  "components": {
    "database": { "status": "...", "message": "...", "last_check": "..." }
  }
}

`GET /readyz`¶

Readiness probe.

If the orchestrator has not been marked ready (MarkReady is set after setup completes) → 503 with startup unhealthy.
Otherwise runs readiness checkers (database, kubernetes); Ready only if all are healthy → 200, else 503.

Response body:

{
  "ready": true,
  "timestamp": "...",
  "components": { }
}

`GET /livez`¶

Liveness probe — always 200.

{
  "status": "alive",
  "timestamp": "<RFC3339 UTC>",
  "version": "v0.1.0"
}

Tip

The Deployment uses /livez for liveness and /readyz for readiness. See Deployment.

HTTP API Reference¶

Edge API¶

POST /api/v1/edge/heartbeat¶

POST /api/v1/edge/nodes¶

POST /api/v1/edge/nodes/decommission¶

Observability & health¶

GET /metrics¶

GET /healthz¶

GET /readyz¶

GET /livez¶

`POST /api/v1/edge/heartbeat`¶

`POST /api/v1/edge/nodes`¶

`POST /api/v1/edge/nodes/decommission`¶

`GET /metrics`¶

`GET /healthz`¶

`GET /readyz`¶

`GET /livez`¶