Skip to content

HTTP API Reference

The Dory Orchestrator exposes a single HTTP server on port 8080 serving the edge API, health/readiness probes, and Prometheus metrics. Edge pods reach it in-cluster at dory-orchestrator-metrics.dory-system.svc.cluster.local:8080.

See Configuration for MetricsPort, Metrics for the /metrics payload, Edge Failover for how the heartbeat drives health detection, and Deployment for the Services that front this port.

Edge API

POST /api/v1/edge/heartbeat

Edge processors call this periodically to report liveness and receive a directive.

  • Non-POST requests → 405 Method Not Allowed.
  • Request body limit: 64 KB.
  • Invalid JSON → 400 Bad Request.

Request body:

{
  "processor_id": "string",
  "node_id": "string",
  "role": "string",
  "epoch": 0,
  "timestamp": 0.0,
  "status": "string",
  "last_state_sync": 0.0,
  "state_size_bytes": 0,
  "uptime_sec": 0.0
}

epoch, last_state_sync, and state_size_bytes are pointers (nullable / optional).

Behavior:

  • If processor_id is set:
    • Update edge_nodes.last_heartbeat_at = NOW() for the node linked to that processor.
    • Update processors.last_health_check_at and processors.health_status (healthy when status == "healthy").
  • Directive: defaults to "continue". If processor_id is set and GetProcessorConfigByID fails — i.e. the processor is not found or its status is terminated/failed — the directive is "shutdown".

Note

Only continue and shutdown are ever returned. There is no demote or promote directive.

Response — always 200, application/json:

{
  "acknowledged": true,
  "orchestrator_time": 0.0,
  "directive": "continue"
}

orchestrator_time is Unix seconds (float). directive is "continue" or "shutdown".

POST /api/v1/edge/nodes

Register an edge node.

Request body:

{
  "name": "string",
  "organization_id": "uuid",
  "failover_enabled": true
}
  • name is required → 400 if missing.
  • organization_id must be a valid UUID → 400 otherwise.

Calls RegisterEdgeNode (INSERT into edge_nodes with status='online').

Response — 201 Created:

{
  "id": "uuid",
  "organization_id": "uuid",
  "name": "string",
  "status": "online",
  "failover_enabled": true,
  "failover_target_node_id": null,
  "last_heartbeat_at": null
}

POST /api/v1/edge/nodes/decommission

Decommission an edge node and stop its active apps.

Request body:

{ "node_name": "string" }
  • node_name is required → 400 if missing.

Sets edge_node_apps.status='stopped' for active apps and edge_nodes.status='decommissioned'.

  • Node not found → 500.

Response — 200:

{ "status": "decommissioned", "node_name": "<name>" }

Observability & health

GET /metrics

Prometheus exposition via promhttp. See Metrics.

GET /healthz

Runs all health checkers. Overall status is unhealthy if any checker is unhealthy, degraded if any is degraded, otherwise healthy.

  • healthy or degraded200.
  • unhealthy503.

Checkers: database (Ping), kubernetes (GetServerVersion), watcher (GetHealth), health-poller.

Response body:

{
  "status": "healthy",
  "timestamp": "...",
  "version": "...",
  "components": {
    "database": { "status": "...", "message": "...", "last_check": "..." }
  }
}

GET /readyz

Readiness probe.

  • If the orchestrator has not been marked ready (MarkReady is set after setup completes) → 503 with startup unhealthy.
  • Otherwise runs readiness checkers (database, kubernetes); Ready only if all are healthy → 200, else 503.

Response body:

{
  "ready": true,
  "timestamp": "...",
  "components": { }
}

GET /livez

Liveness probe — always 200.

{
  "status": "alive",
  "timestamp": "<RFC3339 UTC>",
  "version": "v0.1.0"
}

Tip

The Deployment uses /livez for liveness and /readyz for readiness. See Deployment.