loading…
Search for a command to run...
loading…
Playwright-based testing and evaluation framework for MCP servers
Playwright-based testing and evaluation framework for MCP servers
GA npm version CI License: MIT
A testing and evaluation framework for Model Context Protocol (MCP) servers. Write deterministic Playwright tests against your MCP tools, or run data-driven eval datasets — including LLM-based evaluation of tool discoverability.
The mcp Playwright fixture connects to your MCP server (stdio or HTTP) and exposes a high-level API for calling tools and asserting responses. Custom matchers keep assertions readable.
import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
test('read_file returns file contents', async ({ mcp }) => {
const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
expect(result).toContainToolText('Hello, world');
expect(result).not.toBeToolError();
});
test('server exposes required tools', async ({ mcp }) => {
const tools = await mcp.listTools();
expect(tools.map((t) => t.name)).toContain('read_file');
});
Playwright tests are fast, deterministic, and designed for CI. Use them for regression testing, schema validation, and protocol conformance. The framework includes built-in conformance checks for the MCP spec.
Available matchers:
| Matcher | Description |
|---|---|
toMatchToolResponse |
Response exactly matches expected value (deep equal) |
toContainToolText |
Response contains expected substrings |
toMatchToolSchema |
Response validates against a Zod schema |
toMatchToolPattern |
Response matches a regex pattern |
toMatchToolSnapshot |
Response matches a saved baseline |
toBeToolError |
Response is (or is not) an error |
toHaveToolResponseSize |
Response size is within bounds |
toSatisfyToolPredicate |
Response satisfies a custom function |
toHaveToolCalls |
LLM called the expected tools |
toHaveToolCallCount |
LLM made N tool calls |
toPassToolJudge |
LLM evaluates response quality against a rubric |
Eval datasets let you define test cases as JSON files and run them with runEvalDataset(). Each case specifies a tool call and one or more assertions.
{
"name": "file-ops",
"cases": [
{
"id": "read-config",
"toolName": "read_file",
"args": { "path": "/tmp/config.json" },
"expect": {
"schema": "file-content",
"containsText": ["version", "name"]
}
},
{
"id": "read-readme",
"toolName": "read_file",
"args": { "path": "/tmp/README.md" },
"expect": {
"snapshot": "readme-snapshot"
}
}
]
}
import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
import { loadEvalDataset, runEvalDataset } from '@gleanwork/mcp-server-tester';
import { z } from 'zod';
test('file operations eval', async ({ mcp }, testInfo) => {
const dataset = await loadEvalDataset('./data/evals.json', {
schemas: { 'file-content': z.object({ content: z.string() }) },
});
const result = await runEvalDataset({ dataset }, { mcp, testInfo });
expect(result.passed).toBe(result.total);
});
Supported assertion types:
| Type | Description |
|---|---|
containsText |
Response includes expected substrings |
schema |
Response validates against a Zod schema |
regex |
Response matches a pattern |
snapshot |
Response matches a saved baseline |
judge |
LLM evaluates response quality against a rubric |
toolsTriggered |
LLM called the expected tools (LLM host mode) |
In LLM host mode, a real LLM receives your server's tool list and a natural language prompt, then decides which tools to call. This tests whether your tool names, descriptions, and input schemas are clear enough for autonomous use — a different question from whether the tools return correct output.
{
"id": "find-config",
"mode": "mcp_host",
"scenario": "Find the application config file and return its contents",
"mcpHostConfig": {
"provider": "anthropic",
"model": "claude-opus-4-20250514"
},
"expect": {
"toolsTriggered": {
"calls": [{ "name": "read_file", "required": true }]
}
}
}
LLM host mode makes real API calls and produces non-deterministic results. Use iterations to run a case multiple times and measure pass rate rather than expecting 100% on a single run. See the LLM Host Guide for configuration and cost management.
Requires Node.js 22+.
npm install --save-dev @gleanwork/mcp-server-tester @playwright/test
The Anthropic SDK is only needed for LLM-as-judge assertions or LLM host mode with the Anthropic provider:
npm install --save-dev @anthropic-ai/sdk
npx mcp-server-tester init
The CLI wizard creates a playwright.config.ts, example tests, and a sample eval dataset configured for your server. See the CLI Guide for all options.
Point the framework at your MCP server in playwright.config.ts:
import { defineConfig } from '@playwright/test';
export default defineConfig({
testDir: './tests',
reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
projects: [
{
name: 'my-server',
use: {
mcpConfig: {
transport: 'stdio',
command: 'node',
args: ['server.js'],
},
},
},
],
});
For HTTP servers, set transport: 'http' and serverUrl. For servers that require OAuth, see the Transports Guide and CLI Guide for authentication setup, including CI/CD token management.
Install AI skills to help your coding assistant generate tests, eval datasets, and MCP host evals:
npx skills add -g gleanwork/mcp-server-tester
This installs skills globally so they're available across all your projects. Four skills are included:
| Skill | Description |
|---|---|
mcp-tester-guide |
Framework reference — matchers, config, auth, anti-patterns |
write-mcp-test |
Generate direct-mode Playwright tests |
write-mcp-eval |
Generate data-driven eval datasets |
write-mcp-host-eval |
Generate LLM host simulation evals |
Compatible with Claude Code, Cursor, Windsurf, Copilot, and 40+ other AI agents.
The examples/ directory contains complete working examples:
These MCP protocol features are not currently supported. These are deliberate scope decisions, not bugs:
listResources, readResource)listPrompts, getPrompt)callTool waits for the complete response)If any of these affect your use case, please open an issue.
MIT
Run in your terminal:
claude mcp add tester -- npx -y @gleanwork/mcp-server-testerYes, Tester MCP is free — one-click install via Unyly at no cost.
No, Tester runs without API keys or environment variables.
Self-hosted: the server runs locally on your machine via the install command above.
Open Tester on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.
Browser automation, scraping, screenshots
by MicrosoftBrowser automation and web scraping.
by modelcontextprotocolPlugin-based MCP server + Chrome extension that gives AI agents access to web applications through the user's authenticated browser session. 100+ plugins with a
by opentabs-dev1,500+ developer infrastructure deals, free tiers, and startup programs across 54 categories. Search deals, compare vendors, plan stacks, and track pricing chan
by robhunterNot sure what to pick?
Find your stack in 60 seconds
Author?
Embed badge for your README
Browse similar
All browse MCPs