Self-Hosting HyperDX on Railway: Configuration Gotchas and Operational Workflow

Context

HyperDX OSS v2 was deployed on Railway with three companion services (ClickHouse v2, MongoDB, OTel collector v2) and one image-based service for the HyperDX application. The initial deployment had infrastructure running and traces flowing (792K+ in ClickHouse), but several configuration mistakes blocked login, registration, and trace querying. Each symptom traced to a specific env-var misconfiguration — debugging them sequentially resolved all issues.

Guidance

1. HyperDX Platform: The Critical Env Var Trio

The HyperDX unified image (hyperdx/hyperdx:latest) runs three internal processes: Next.js frontend (port 8080), Express API server (port 8000), and an alert-task worker. These three env vars must be set:

SERVER_URL=http://0.0.0.0:8000       # Internal API port — NOT the public domain
FRONTEND_URL=https://your.domain.com  # Public URL users visit in the browser
EXPRESS_SESSION_SECRET=<random>       # Required for session cookies

SERVER_URL must point to the internal http://0.0.0.0:8000 — the address where the Express API process listens inside the container. When set to the external domain (https://analytics.brainforge.ai), the Next.js frontend proxy cannot resolve API requests internally, returning 404 on /api/register/password and /api/login/password.

FRONTEND_URL must be the public-facing custom domain. When unset, the login form POST redirects to http://localhost:8080 (the default), causing ERR_CONNECTION_REFUSED in the browser. This is the most common “login doesn’t work” symptom.

2. HyperDX v2 Uses the hyperdx Database, Not default

The v2 unified image with BETA_CH_OTEL_JSON_SCHEMA_ENABLED=true creates tables in the hyperdx database:

SELECT count() FROM hyperdx.otel_traces;   -- traces
SELECT count() FROM hyperdx.otel_logs;      -- logs
SELECT count() FROM hyperdx.otel_metrics_sum; -- metrics

Auto-created data sources in the HyperDX UI default to default.<table> — switching them to hyperdx.<table> resolves “0 results” even when data exists. This applies to the Team Settings → Data tab for each source type (Logs, Traces, Metrics, Sessions).

3. OTel Exporter Endpoint — Never Include the Signal Path

The OpenTelemetry SDK auto-appends /v1/traces, /v1/logs, or /v1/metrics to OTEL_EXPORTER_OTLP_ENDPOINT. Including the path in the env var doubles it:

# BROKEN — resolves to .../v1/traces/v1/traces
OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.up.railway.app/v1/traces

# CORRECT — SDK appends the path automatically
OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.up.railway.app

This applies to ALL services (Platform, Slack Assistant, OpenCode Worker) and must match between Railway env vars and .env.example documentation. The browser collector URL (NEXT_PUBLIC_HYPERDX_COLLECTOR_URL) also omits the suffix.

4. HyperDX v2 Has No Programmatic Search API

The v1 API paths (/v1/search, /api/search, /v1/traces) return 404 on the v2 unified image. The only working v2 endpoints are:

  • POST /v2/traces — trace ingestion (internal)
  • GET /health — health check

For verifying ingestion, query ClickHouse directly:

# Via Railway SSH
ssh analytics.brainforge.ai@ssh.railway.com
clickhouse-client --query "SELECT count(), ServiceName FROM hyperdx.otel_traces GROUP BY ServiceName"

Or via the Railway dashboard Shell tab on hyperdx-clickhouse-v2.

5. Railway Operations for Image-Based Services

Image-based Railway services (deployed from hyperdx/hyperdx:latest) differ from repo-based services:

ActionWorking CLI Command
Restart (pick up env vars)railway service restart --service <service-id> --yes
SSH inssh <service-public-domain>@ssh.railway.com
Set env varsrailway variable set --service <name> KEY=VALUE
DOES NOT WORKrailway redeploy (repo-based only), railway variables set for image services in some CLI versions

To run MongoDB operations inside the HyperDX container (which has internal network access):

ssh analytics.brainforge.ai@ssh.railway.com
node -e "
const mongoose = require('mongoose');
await mongoose.connect(process.env.MONGO_URI);
await mongoose.connection.db.collection('users').drop();
await mongoose.disconnect();
"

6. Complete Env Var Checklist for HyperDX on Railway

HyperDX service (image-based):

SERVER_URL=http://0.0.0.0:8000
FRONTEND_URL=https://analytics.brainforge.ai
EXPRESS_SESSION_SECRET=<generated>
CLICKHOUSE_HOST=hyperdx-clickhouse-v2.railway.internal
CLICKHOUSE_PORT=9000
CLICKHOUSE_PASSWORD=<from-1password>
MONGO_URI=mongodb://root:<pass>@hyperdx-mongo.railway.internal:27017
BETA_CH_OTEL_JSON_SCHEMA_ENABLED=true
HYPERDX_LOG_LEVEL=info
HYPERDX_API_KEY=<ingestion-key>
INGESTION_API_KEY=<ingestion-key>

Application services (Platform, Slack, Worker):

HYPERDX_API_KEY=<ingestion-key>
HYPERDX_ENABLED=true
OTEL_SERVICE_NAME=brainforge-<platform|slack-assistant|opencode-worker>
OTEL_EXPORTER_OTLP_ENDPOINT=https://hyperdx-otel-collector-v2.up.railway.app
# Browser-only (Platform):
NEXT_PUBLIC_HYPERDX_ENABLED=true
NEXT_PUBLIC_HYPERDX_API_KEY=<browser-key>
NEXT_PUBLIC_HYPERDX_SERVICE=brainforge-platform-web
NEXT_PUBLIC_HYPERDX_COLLECTOR_URL=https://hyperdx-otel-collector-v2.up.railway.app

Why This Matters

Each configuration mistake manifests as a silent failure rather than a clear error. The Login/register → login redirect chain with FRONTEND_URL/SERVER_URL, the OTel endpoint path doubling, and the ClickHouse database mismatch form a chain: one symptom per misconfiguration. Understanding this chain lets you fix deploy-and-forget without guessing.

When to Apply

  • Deploying HyperDX (or similar frontend/backend-split Docker apps) on Railway with private DNS
  • Login redirects to localhost or registration returns 404 after self-hosted deployment
  • OTel collector accepts traces (partialSuccess: {}) but HyperDX UI shows “0 results”
  • Need to SSH into a Railway image-based service for debugging or data manipulation
  • Cleaning up stale v1 infrastructure services after a v2 migration

Examples

Symptom: Login → localhost connection refused

Before:

# FRONTEND_URL not set, SERVER_URL set to external domain
SERVER_URL=https://analytics.brainforge.ai

After:

SERVER_URL=http://0.0.0.0:8000
FRONTEND_URL=https://analytics.brainforge.ai

Symptom: Traces accepted but “0 results” in UI

Before (in HyperDX UI data source):

Database: default
Table: otel_traces

After:

Database: hyperdx
Table: otel_traces

Symptom: OTel traces silently failing despite collector accepting them

Before:

OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.railway.app/v1/traces  # SDK doubles this

After:

OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.railway.app  # SDK appends /v1/traces
  • knowledge/standards/03-knowledge/engineering/runbooks/hyperdx-operations.md — ongoing operations (health checks, key rotation, retention, troubleshooting, alerting)
  • PR #913 — productionize HyperDX observability across all services
  • PR #918 — complete verification, deployment, cleanup
  • PR #881 — initial PoC
  • apps/hyperdx/Dockerfile — HyperDX unified image configuration
  • apps/platform/deploy/hyperdx-otel-collector/custom.config.yaml — OTel collector config
  • apps/platform/docs/analytics/hyperdx-validation.md — end-to-end validation procedures