Self-Hosting HyperDX on Railway: Configuration Gotchas and Operational Workflow
Context
HyperDX OSS v2 was deployed on Railway with three companion services (ClickHouse v2, MongoDB, OTel collector v2) and one image-based service for the HyperDX application. The initial deployment had infrastructure running and traces flowing (792K+ in ClickHouse), but several configuration mistakes blocked login, registration, and trace querying. Each symptom traced to a specific env-var misconfiguration — debugging them sequentially resolved all issues.
Guidance
1. HyperDX Platform: The Critical Env Var Trio
The HyperDX unified image (hyperdx/hyperdx:latest) runs three internal processes: Next.js frontend (port 8080), Express API server (port 8000), and an alert-task worker. These three env vars must be set:
SERVER_URL=http://0.0.0.0:8000 # Internal API port — NOT the public domain
FRONTEND_URL=https://your.domain.com # Public URL users visit in the browser
EXPRESS_SESSION_SECRET=<random> # Required for session cookiesSERVER_URL must point to the internal http://0.0.0.0:8000 — the address where the Express API process listens inside the container. When set to the external domain (https://analytics.brainforge.ai), the Next.js frontend proxy cannot resolve API requests internally, returning 404 on /api/register/password and /api/login/password.
FRONTEND_URL must be the public-facing custom domain. When unset, the login form POST redirects to http://localhost:8080 (the default), causing ERR_CONNECTION_REFUSED in the browser. This is the most common “login doesn’t work” symptom.
2. HyperDX v2 Uses the hyperdx Database, Not default
The v2 unified image with BETA_CH_OTEL_JSON_SCHEMA_ENABLED=true creates tables in the hyperdx database:
SELECT count() FROM hyperdx.otel_traces; -- traces
SELECT count() FROM hyperdx.otel_logs; -- logs
SELECT count() FROM hyperdx.otel_metrics_sum; -- metricsAuto-created data sources in the HyperDX UI default to default.<table> — switching them to hyperdx.<table> resolves “0 results” even when data exists. This applies to the Team Settings → Data tab for each source type (Logs, Traces, Metrics, Sessions).
3. OTel Exporter Endpoint — Never Include the Signal Path
The OpenTelemetry SDK auto-appends /v1/traces, /v1/logs, or /v1/metrics to OTEL_EXPORTER_OTLP_ENDPOINT. Including the path in the env var doubles it:
# BROKEN — resolves to .../v1/traces/v1/traces
OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.up.railway.app/v1/traces
# CORRECT — SDK appends the path automatically
OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.up.railway.app
This applies to ALL services (Platform, Slack Assistant, OpenCode Worker) and must match between Railway env vars and .env.example documentation. The browser collector URL (NEXT_PUBLIC_HYPERDX_COLLECTOR_URL) also omits the suffix.
4. HyperDX v2 Has No Programmatic Search API
The v1 API paths (/v1/search, /api/search, /v1/traces) return 404 on the v2 unified image. The only working v2 endpoints are:
POST /v2/traces— trace ingestion (internal)GET /health— health check
For verifying ingestion, query ClickHouse directly:
# Via Railway SSH
ssh analytics.brainforge.ai@ssh.railway.com
clickhouse-client --query "SELECT count(), ServiceName FROM hyperdx.otel_traces GROUP BY ServiceName"Or via the Railway dashboard Shell tab on hyperdx-clickhouse-v2.
5. Railway Operations for Image-Based Services
Image-based Railway services (deployed from hyperdx/hyperdx:latest) differ from repo-based services:
| Action | Working CLI Command |
|---|---|
| Restart (pick up env vars) | railway service restart --service <service-id> --yes |
| SSH in | ssh <service-public-domain>@ssh.railway.com |
| Set env vars | railway variable set --service <name> KEY=VALUE |
| DOES NOT WORK | railway redeploy (repo-based only), railway variables set for image services in some CLI versions |
To run MongoDB operations inside the HyperDX container (which has internal network access):
ssh analytics.brainforge.ai@ssh.railway.com
node -e "
const mongoose = require('mongoose');
await mongoose.connect(process.env.MONGO_URI);
await mongoose.connection.db.collection('users').drop();
await mongoose.disconnect();
"6. Complete Env Var Checklist for HyperDX on Railway
HyperDX service (image-based):
SERVER_URL=http://0.0.0.0:8000
FRONTEND_URL=https://analytics.brainforge.ai
EXPRESS_SESSION_SECRET=<generated>
CLICKHOUSE_HOST=hyperdx-clickhouse-v2.railway.internal
CLICKHOUSE_PORT=9000
CLICKHOUSE_PASSWORD=<from-1password>
MONGO_URI=mongodb://root:<pass>@hyperdx-mongo.railway.internal:27017
BETA_CH_OTEL_JSON_SCHEMA_ENABLED=true
HYPERDX_LOG_LEVEL=info
HYPERDX_API_KEY=<ingestion-key>
INGESTION_API_KEY=<ingestion-key>Application services (Platform, Slack, Worker):
HYPERDX_API_KEY=<ingestion-key>
HYPERDX_ENABLED=true
OTEL_SERVICE_NAME=brainforge-<platform|slack-assistant|opencode-worker>
OTEL_EXPORTER_OTLP_ENDPOINT=https://hyperdx-otel-collector-v2.up.railway.app
# Browser-only (Platform):
NEXT_PUBLIC_HYPERDX_ENABLED=true
NEXT_PUBLIC_HYPERDX_API_KEY=<browser-key>
NEXT_PUBLIC_HYPERDX_SERVICE=brainforge-platform-web
NEXT_PUBLIC_HYPERDX_COLLECTOR_URL=https://hyperdx-otel-collector-v2.up.railway.appWhy This Matters
Each configuration mistake manifests as a silent failure rather than a clear error. The Login/register → login redirect chain with FRONTEND_URL/SERVER_URL, the OTel endpoint path doubling, and the ClickHouse database mismatch form a chain: one symptom per misconfiguration. Understanding this chain lets you fix deploy-and-forget without guessing.
When to Apply
- Deploying HyperDX (or similar frontend/backend-split Docker apps) on Railway with private DNS
- Login redirects to localhost or registration returns 404 after self-hosted deployment
- OTel collector accepts traces (
partialSuccess: {}) but HyperDX UI shows “0 results” - Need to SSH into a Railway image-based service for debugging or data manipulation
- Cleaning up stale v1 infrastructure services after a v2 migration
Examples
Symptom: Login → localhost connection refused
Before:
# FRONTEND_URL not set, SERVER_URL set to external domain
SERVER_URL=https://analytics.brainforge.aiAfter:
SERVER_URL=http://0.0.0.0:8000
FRONTEND_URL=https://analytics.brainforge.aiSymptom: Traces accepted but “0 results” in UI
Before (in HyperDX UI data source):
Database: default
Table: otel_traces
After:
Database: hyperdx
Table: otel_traces
Symptom: OTel traces silently failing despite collector accepting them
Before:
OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.railway.app/v1/traces # SDK doubles thisAfter:
OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.railway.app # SDK appends /v1/tracesRelated
knowledge/standards/03-knowledge/engineering/runbooks/hyperdx-operations.md— ongoing operations (health checks, key rotation, retention, troubleshooting, alerting)- PR #913 — productionize HyperDX observability across all services
- PR #918 — complete verification, deployment, cleanup
- PR #881 — initial PoC
apps/hyperdx/Dockerfile— HyperDX unified image configurationapps/platform/deploy/hyperdx-otel-collector/custom.config.yaml— OTel collector configapps/platform/docs/analytics/hyperdx-validation.md— end-to-end validation procedures