loading…
Search for a command to run...
loading…
Comprehensive MCP server for Grafana, Prometheus, Kafka UI, and Datadog with a secure "Bring Your Own Key" or BYOK model.
Comprehensive MCP server for Grafana, Prometheus, Kafka UI, and Datadog with a secure "Bring Your Own Key" or BYOK model.
Query your observability stack from Claude Code, Codex, or any MCP client and no data leaves your machine.
npm version License: MIT Awesome MCP Servers Node ≥18 alimuratkuslu/byok-observability-mcp MCP server
Bring Your Own Keys — credentials stay in env vars on your machine. No clone, no build, runs via npx.
Partial setup — configure only the backends you use. Tools for unconfigured backends are never exposed.
Quick Start · Tools · Credentials · Configuration · Scheduled Reports · Examples · Security · Development
🤖 Claude Code / Codex CLI
│
▼
⚡ byok-observability-mcp
(Local npx process)
│
🔒 env vars never leave your machine
│
▼
┌─────────────────────────────────┐
│ 📊 Grafana 🔥 Prometheus │
│ 🛶 Kafka UI 🐶 Datadog │
└─────────────────────────────────┘
Run once, answer a few questions, get a ready-made .mcp.json:
npx byok-observability-mcp --init
The wizard will:
.mcp.json to your project root or ~/.claude/ — your choiceThen just start Claude Code:
claude
[!TIP] That's it. No clone, no build, no env file. Works in under 60 seconds.
.mcp.jsonCreate .mcp.json in your project root. Include only the backends you need.
{
"mcpServers": {
"observability-mcp": {
"command": "npx",
"args": ["-y", "byok-observability-mcp"],
"env": {
"GRAFANA_URL": "https://grafana.mycompany.internal",
"GRAFANA_TOKEN": "glsa_...",
"PROMETHEUS_URL": "https://prometheus.mycompany.internal",
"KAFKA_UI_URL": "https://kafka-ui.mycompany.internal",
"DD_API_KEY": "your-datadog-api-key",
"DD_APP_KEY": "your-datadog-app-key"
}
}
}
}
Credentials in git? Use the
${VAR}approach instead — see Configuration → Method B.
Start Claude Code:
claude
Claude Code reads .mcp.json automatically. No claude mcp add, no build step.
Verify by asking Claude:
What observability tools do you have available?
| Client | Configuration |
|---|---|
| Claude Code | .mcp.json in project root (recommended) or claude mcp add CLI |
| OpenAI Codex CLI | .mcp.json in project root — same format as Claude Code |
Both clients read .mcp.json automatically. The Quick Start above works for either.
# Same .mcp.json as above works out of the box
codex
Or add via CLI:
codex mcp add --transport stdio observability-mcp -- npx -y byok-observability-mcp
Always available. Checks connectivity across all configured backends.
| Tool | Description |
|---|---|
obs_health_check |
Unified Health Check. Runs a parallel check on all backends and returns a status table. |
Enabled when
GRAFANA_URL+GRAFANA_TOKENare set.
| Tool | Description |
|---|---|
grafana_health |
Check connectivity, version, and database status |
grafana_list_datasources |
List all datasources (name, type, UID) |
grafana_query_metrics |
Run a PromQL expression via a Grafana datasource |
grafana_list_dashboards |
Search and list dashboards by name or tag |
grafana_get_dashboard |
Get panels and metadata for a dashboard by UID |
grafana_list_alerts |
List active alerts from Alertmanager (firing/pending) |
grafana_get_alert_rules |
List all configured alert rules across all folders |
Enabled when
PROMETHEUS_URLis set.
| Tool | Description |
|---|---|
prometheus_health |
Check connectivity |
prometheus_query |
Instant PromQL query — current value of a metric |
prometheus_query_range |
Range PromQL query — metric values over time |
prometheus_list_metrics |
List all available metric names |
prometheus_metric_metadata |
Get help text and type for a specific metric |
Enabled when
KAFKA_UI_URLis set.
| Tool | Description |
|---|---|
kafka_list_clusters |
List configured Kafka clusters and their status |
kafka_list_topics |
List topics in a cluster |
kafka_describe_topic |
Get partition count, replication factor, and config |
kafka_list_consumer_groups |
List consumer groups and their state |
kafka_consumer_group_lag |
Get per-partition lag for a consumer group |
kafka_broker_health |
Broker count and disk usage per broker |
Enabled when both
DD_API_KEYandDD_APP_KEYare set. Proxies the official Datadog MCP server.
Default toolsets: core, apm, alerting. Set DD_TOOLSETS=all to load everything.
| Toolset | Covers |
|---|---|
core |
Metrics, dashboards, monitors, infrastructure |
apm |
APM services, traces, service map |
alerting |
Monitors, downtimes, alerts |
logs |
Log search and analytics |
incidents |
Incident management |
ddsql |
SQL-style metric queries |
security |
Cloud security posture |
synthetics |
Synthetic test results |
networks |
Network performance monitoring |
dbm |
Database monitoring |
software-delivery |
CI/CD pipelines |
llm-obs |
LLM observability |
cases |
Case management |
feature-flags |
Feature flag tracking |
Viewer → Createglsa_) — you won't see it againGRAFANA_URL=https://grafana.mycompany.internal
GRAFANA_TOKEN=glsa_xxxxxxxxxxxxxxxx
If your Grafana uses a self-signed certificate:
GRAFANA_VERIFY_SSL=false
If Prometheus has no authentication:
PROMETHEUS_URL=https://prometheus.mycompany.internal
If Prometheus uses basic auth:
PROMETHEUS_URL=https://prometheus.mycompany.internal
PROMETHEUS_USERNAME=your-username
PROMETHEUS_PASSWORD=your-password
If Kafka UI has no authentication:
KAFKA_UI_URL=https://kafka-ui.mycompany.internal
If Kafka UI requires a login:
KAFKA_UI_URL=https://kafka-ui.mycompany.internal
KAFKA_UI_USERNAME=admin
KAFKA_UI_PASSWORD=your-password
API key: Datadog → Organization Settings → API Keys → New Key
Application key: Datadog → Organization Settings → Application Keys → New Key
DD_SITE — match your Datadog login URL:
| Login URL | DD_SITE |
|---|---|
app.datadoghq.com |
datadoghq.com (default) |
app.us3.datadoghq.com |
us3.datadoghq.com |
app.us5.datadoghq.com |
us5.datadoghq.com |
app.datadoghq.eu |
datadoghq.eu |
app.ap1.datadoghq.com |
ap1.datadoghq.com |
DD_API_KEY=your-api-key
DD_APP_KEY=your-application-key
DD_SITE=datadoghq.com
DD_TOOLSETS=core,apm,alerting
.mcp.json (simplest)Put credentials directly in .mcp.json. Works everywhere, no extra steps.
Add .mcp.json to your .gitignore if the repo is shared.
Use ${VAR} placeholders in .mcp.json and put real values in .env.
.mcp.json (safe to commit — contains no secrets):
{
"mcpServers": {
"observability-mcp": {
"command": "npx",
"args": ["-y", "byok-observability-mcp"],
"env": {
"GRAFANA_URL": "${GRAFANA_URL}",
"GRAFANA_TOKEN": "${GRAFANA_TOKEN}",
"PROMETHEUS_URL": "${PROMETHEUS_URL}",
"KAFKA_UI_URL": "${KAFKA_UI_URL}",
"DD_API_KEY": "${DD_API_KEY}",
"DD_APP_KEY": "${DD_APP_KEY}"
}
}
}
}
.env (add to .gitignore):
GRAFANA_URL=https://grafana.mycompany.internal
GRAFANA_TOKEN=glsa_...
Start Claude with the env loaded:
set -a && source .env && set +a && claude
A ready-made helper script is included:
./scripts/run-claude-with-env.sh
A template .mcp.json with all variables is available as .mcp.json.example.
Add to ~/.claude.json:
{
"mcpServers": {
"observability-mcp": {
"command": "npx",
"args": ["-y", "byok-observability-mcp"],
"env": {
"GRAFANA_URL": "https://grafana.mycompany.internal",
"GRAFANA_TOKEN": "glsa_..."
}
}
}
}
| Variable | Backend | Required | Description |
|---|---|---|---|
GRAFANA_URL |
Grafana | ✅ | Base URL of your Grafana instance |
GRAFANA_TOKEN |
Grafana | ✅ | Service account token (Viewer role) |
GRAFANA_VERIFY_SSL |
Grafana | Set to false to skip TLS verification |
|
PROMETHEUS_URL |
Prometheus | ✅ | Base URL of your Prometheus instance |
PROMETHEUS_USERNAME |
Prometheus | Basic auth username | |
PROMETHEUS_PASSWORD |
Prometheus | Basic auth password | |
KAFKA_UI_URL |
Kafka UI | ✅ | Base URL of your Kafka UI instance |
KAFKA_UI_USERNAME |
Kafka UI | Login username | |
KAFKA_UI_PASSWORD |
Kafka UI | Login password | |
DD_API_KEY |
Datadog | ✅ | Datadog API key |
DD_APP_KEY |
Datadog | ✅ | Datadog Application key |
DD_SITE |
Datadog | Datadog site (default: datadoghq.com) |
|
DD_TOOLSETS |
Datadog | Tool groups to load (default: core,apm,alerting) |
|
SLACK_WEBHOOK_URL |
Reports | ✅* | Slack Incoming Webhook URL for scheduled reports |
REPORT_BACKENDS |
Reports | Comma-separated backends to include in reports (default: all configured) |
Send an automated observability digest to Slack on a schedule — no Claude or Codex instance needs to be running.
cron / launchd
│ fires every N minutes
▼
npx byok-observability-mcp --report
│
│ reads env vars, connects directly to backends
▼
Grafana · Prometheus · Kafka UI
│
│ categorizes findings → P0 / P1 / P2 / P3
▼
Slack Incoming Webhook → #your-channel
The command collects data, categorizes every finding by severity, formats a Slack message, sends it, and exits. It is completely stateless.
| Level | Meaning | Examples |
|---|---|---|
| 🔴 P0 — KRİTİK | Service down or unreachable | Grafana alert firing (critical), Kafka cluster offline, backend unreachable |
| 🟠 P1 — YÜKSEK | Degraded, action needed soon | Grafana alert firing (non-critical), Kafka consumer lag > 10 000 |
| 🟡 P2 — ORTA | Warning, monitor closely | Grafana alert pending, Kafka consumer lag > 1 000 |
| 🟢 P3 — BİLGİ | Informational, all normal | Healthy backends, silenced alerts |
Step 1 — Get a Slack Incoming Webhook URL
Step 2 — Set environment variables
export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/XXX/YYY/ZZZ
# Optional: restrict which backends are included (default: all configured)
export REPORT_BACKENDS=grafana,prometheus,kafka
Step 3 — Run a one-off report to verify
npx byok-observability-mcp --report
You should see a message in your Slack channel within seconds.
Step 4 — Schedule with cron
Open your crontab:
crontab -e
Add a line. Examples:
# Every hour at minute 0
0 * * * * SLACK_WEBHOOK_URL=https://hooks.slack.com/... GRAFANA_URL=... GRAFANA_TOKEN=... npx byok-observability-mcp --report >> /tmp/obs-report.log 2>&1
# Every 30 minutes
*/30 * * * * SLACK_WEBHOOK_URL=https://hooks.slack.com/... npx byok-observability-mcp --report >> /tmp/obs-report.log 2>&1
[!TIP] Put all env vars in a
.envfile and source it inside the cron command to keep the crontab clean:0 * * * * bash -c 'source /path/to/.env && npx byok-observability-mcp --report' >> /tmp/obs-report.log 2>&1
Alternative: macOS launchd (runs on login, survives reboots)
Create ~/Library/LaunchAgents/com.observability-mcp.report.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.observability-mcp.report</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/npx</string>
<string>byok-observability-mcp</string>
<string>--report</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>SLACK_WEBHOOK_URL</key>
<string>https://hooks.slack.com/services/XXX/YYY/ZZZ</string>
<key>GRAFANA_URL</key>
<string>https://grafana.mycompany.internal</string>
<key>GRAFANA_TOKEN</key>
<string>glsa_...</string>
</dict>
<key>StartInterval</key>
<integer>3600</integer>
<key>StandardOutPath</key>
<string>/tmp/obs-report.log</string>
<key>StandardErrorPath</key>
<string>/tmp/obs-report.log</string>
</dict>
</plist>
Load it:
launchctl load ~/Library/LaunchAgents/com.observability-mcp.report.plist
To stop: launchctl unload ~/Library/LaunchAgents/com.observability-mcp.report.plist
| Backend | Try asking Claude... |
|---|---|
| Grafana | "List all datasources and tell me which ones are Prometheus type." |
| Grafana | "Search for dashboards related to 'kubernetes' — list names and UIDs." |
| Grafana | "Query http_requests_total rate over the last hour via the default Prometheus datasource." |
| Prometheus | "What is the current value of the up metric? Which targets are down?" |
| Prometheus | "Show CPU usage (node_cpu_seconds_total rate) over the past hour, by instance." |
| Prometheus | "List all available metrics that start with http_." |
| Kafka UI | "List all Kafka clusters. Are there any with offline brokers?" |
| Kafka UI | "Describe the topic 'orders' in cluster 'production' — partitions and replication factor?" |
| Kafka UI | "Check consumer lag for group 'order-processor'. Which partitions have the highest lag?" |
| Datadog | "List all Datadog monitors currently in Alert state." |
| Datadog | "Show APM service performance for the past hour. Which services have the highest error rate?" |
| Datadog | "Query aws.ec2.cpuutilization for the last 30 minutes. Which hosts are above 80%?" |
| Goal | Try asking Claude... |
|---|---|
| Health | "Run a health check on all systems." |
| Alerts | "Are there any firing alerts in Grafana right now?" |
| Triage | "Show me the alert rules for the 'Production' folder." |
Check the health of all configured observability backends and give me a summary.
I'm seeing high error rates. Check Prometheus for http_requests_total with status=500,
then look for related Datadog monitors that might be alerting.
[!NOTE] All tools are read-only. No write operations are performed on any backend.
[!IMPORTANT] Credentials are read from environment variables and never logged or sent to Anthropic. Tokens are redacted in all error messages.
Least-privilege recommendations:
| Backend | Recommended role |
|---|---|
| Grafana | Service account with Viewer role |
| Prometheus | Network-level read-only access |
| Kafka UI | Read-only UI user |
| Datadog | API key + Application key with read scopes |
git clone https://github.com/alimuratkuslu/byok-observability-mcp
cd byok-observability-mcp
npm install
npm run dev # run with tsx (no build step)
npm run build # compile to dist/
npm run typecheck # TypeScript check without emitting
| Backend | Tested version |
|---|---|
| Grafana | v9.x, v10.x, v11.x |
| Prometheus | v2.x |
| Kafka UI | provectus/kafka-ui:v0.7.2 |
MIT
Add this to claude_desktop_config.json and restart Claude Desktop.
{
"mcpServers": {
"alimuratkuslu-byok-observability-mcp": {
"command": "npx",
"args": []
}
}
}Extract design specs and assets
An Open-Sourced UI to install and manage MCP servers for Windows, Linux and macOS.
Build, validate, and deploy multi-agent AI solutions on the ADAS platform. Design skills with tools, manage solution lifecycle, and connect from any AI environm
MCP Bundles: Create custom bundles of tools and connect providers with OAuth or API keys. Use one MCP server across thousands of integrations, with programmatic