Tester

FreeMaintained

Playwright-based testing and evaluation framework for MCP servers

by chenhao-yang-glean

GitHub Embed

About

Playwright-based testing and evaluation framework for MCP servers

README

GA npm version CI License: MIT

A testing and evaluation framework for Model Context Protocol (MCP) servers. Write deterministic Playwright tests against your MCP tools, or run data-driven eval datasets — including LLM-based evaluation of tool discoverability.

Playwright Tests

The mcp Playwright fixture connects to your MCP server (stdio or HTTP) and exposes a high-level API for calling tools and asserting responses. Custom matchers keep assertions readable.

import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';

test('read_file returns file contents', async ({ mcp }) => {
  const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
  expect(result).toContainToolText('Hello, world');
  expect(result).not.toBeToolError();
});

test('server exposes required tools', async ({ mcp }) => {
  const tools = await mcp.listTools();
  expect(tools.map((t) => t.name)).toContain('read_file');
});

Playwright tests are fast, deterministic, and designed for CI. Use them for regression testing, schema validation, and protocol conformance. The framework includes built-in conformance checks for the MCP spec.

Available matchers:

Matcher	Description
`toMatchToolResponse`	Response exactly matches expected value (deep equal)
`toContainToolText`	Response contains expected substrings
`toMatchToolSchema`	Response validates against a Zod schema
`toMatchToolPattern`	Response matches a regex pattern
`toMatchToolSnapshot`	Response matches a saved baseline
`toBeToolError`	Response is (or is not) an error
`toHaveToolResponseSize`	Response size is within bounds
`toSatisfyToolPredicate`	Response satisfies a custom function
`toHaveToolCalls`	LLM called the expected tools
`toHaveToolCallCount`	LLM made N tool calls
`toPassToolJudge`	LLM evaluates response quality against a rubric

Eval Datasets

Eval datasets let you define test cases as JSON files and run them with runEvalDataset(). Each case specifies a tool call and one or more assertions.

{
  "name": "file-ops",
  "cases": [
    {
      "id": "read-config",
      "toolName": "read_file",
      "args": { "path": "/tmp/config.json" },
      "expect": {
        "schema": "file-content",
        "containsText": ["version", "name"]
      }
    },
    {
      "id": "read-readme",
      "toolName": "read_file",
      "args": { "path": "/tmp/README.md" },
      "expect": {
        "snapshot": "readme-snapshot"
      }
    }
  ]
}

import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
import { loadEvalDataset, runEvalDataset } from '@gleanwork/mcp-server-tester';
import { z } from 'zod';

test('file operations eval', async ({ mcp }, testInfo) => {
  const dataset = await loadEvalDataset('./data/evals.json', {
    schemas: { 'file-content': z.object({ content: z.string() }) },
  });
  const result = await runEvalDataset({ dataset }, { mcp, testInfo });
  expect(result.passed).toBe(result.total);
});

Supported assertion types:

Type	Description
`containsText`	Response includes expected substrings
`schema`	Response validates against a Zod schema
`regex`	Response matches a pattern
`snapshot`	Response matches a saved baseline
`judge`	LLM evaluates response quality against a rubric
`toolsTriggered`	LLM called the expected tools (LLM host mode)

LLM host mode

In LLM host mode, a real LLM receives your server's tool list and a natural language prompt, then decides which tools to call. This tests whether your tool names, descriptions, and input schemas are clear enough for autonomous use — a different question from whether the tools return correct output.

{
  "id": "find-config",
  "mode": "mcp_host",
  "scenario": "Find the application config file and return its contents",
  "mcpHostConfig": {
    "provider": "anthropic",
    "model": "claude-opus-4-20250514"
  },
  "expect": {
    "toolsTriggered": {
      "calls": [{ "name": "read_file", "required": true }]
    }
  }
}

LLM host mode makes real API calls and produces non-deterministic results. Use iterations to run a case multiple times and measure pass rate rather than expecting 100% on a single run. See the LLM Host Guide for configuration and cost management.

Installation

Requires Node.js 22+.

npm install --save-dev @gleanwork/mcp-server-tester @playwright/test

The Anthropic SDK is only needed for LLM-as-judge assertions or LLM host mode with the Anthropic provider:

npm install --save-dev @anthropic-ai/sdk

Quick Start

npx mcp-server-tester init

The CLI wizard creates a playwright.config.ts, example tests, and a sample eval dataset configured for your server. See the CLI Guide for all options.

Configuration

Point the framework at your MCP server in playwright.config.ts:

import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
  projects: [
    {
      name: 'my-server',
      use: {
        mcpConfig: {
          transport: 'stdio',
          command: 'node',
          args: ['server.js'],
        },
      },
    },
  ],
});

For HTTP servers, set transport: 'http' and serverUrl. For servers that require OAuth, see the Transports Guide and CLI Guide for authentication setup, including CI/CD token management.

Documentation

Quick Start — detailed setup and configuration
Expectations — all assertion types including snapshot sanitizers
LLM Host Simulation — tool discoverability testing
API Reference
Transports — stdio and HTTP configuration, OAuth
CLI Commands — init, generate, login, token
UI Reporter — interactive web UI for test results
Development — contributing and building
Migration Guide (v0.12 → v1.0) — upgrading from pre-1.0 releases

AI Skills

Install AI skills to help your coding assistant generate tests, eval datasets, and MCP host evals:

npx skills add -g gleanwork/mcp-server-tester

This installs skills globally so they're available across all your projects. Four skills are included:

Skill	Description
`mcp-tester-guide`	Framework reference — matchers, config, auth, anti-patterns
`write-mcp-test`	Generate direct-mode Playwright tests
`write-mcp-eval`	Generate data-driven eval datasets
`write-mcp-host-eval`	Generate LLM host simulation evals

Compatible with Claude Code, Cursor, Windsurf, Copilot, and 40+ other AI agents.

Examples

The examples/ directory contains complete working examples:

filesystem-server/ — Test suite for Anthropic's Filesystem MCP server: 5 Playwright tests, 11 eval dataset cases, Zod schema validation.
sqlite-server/ — Test suite for a SQLite MCP server: 11 Playwright tests, 14 eval dataset cases.
basic-playwright-usage/ — Minimal Playwright patterns.

Known Limitations

These MCP protocol features are not currently supported. These are deliberate scope decisions, not bugs:

MCP resources (listResources, readResource)
MCP prompts (listPrompts, getPrompt)
Server-to-client notifications
Streaming tool responses (callTool waits for the complete response)

If any of these affect your use case, please open an issue.

License

MIT

from github.com/gleanwork/mcp-server-tester

Install Tester in Claude Desktop, Claude Code & Cursor

Run in your terminal:

claude mcp add tester -- npx -y @gleanwork/mcp-server-tester

FAQ

Is Tester MCP free?

Yes, Tester MCP is free — one-click install via Unyly at no cost.

Does Tester need an API key?

No, Tester runs without API keys or environment variables.

Is Tester hosted or self-hosted?

Self-hosted: the server runs locally on your machine via the install command above.

How do I install Tester in Claude Desktop, Claude Code or Cursor?

Open Tester on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.

Related MCPs

Playwright

Browser automation, scraping, screenshots

by Microsoft

4.915.2K

Puppeteer

Browser automation and web scraping.

by modelcontextprotocol

opentabs-dev/opentabs

Plugin-based MCP server + Chrome extension that gives AI agents access to web applications through the user's authenticated browser session. 100+ plugins with a

by opentabs-dev

robhunter/agentdeals

1,500+ developer infrastructure deals, free tiers, and startup programs across 54 categories. Search deals, compare vendors, plan stacks, and track pricing chan

by robhunter

Compare Tester with

TestervsPlaywright TestervsPuppeteer Testervsopentabs-dev/opentabs Testervsrobhunter/agentdeals

Not sure what to pick?

Find your stack in 60 seconds

Author?

Embed badge for your README

Browse similar

All browse MCPs

loading…

Browse all

Tester

FreeMaintained

Playwright-based testing and evaluation framework for MCP servers

by chenhao-yang-glean

GitHub Embed

About

Playwright-based testing and evaluation framework for MCP servers

README

GA npm version CI License: MIT

Playwright Tests

The mcp Playwright fixture connects to your MCP server (stdio or HTTP) and exposes a high-level API for calling tools and asserting responses. Custom matchers keep assertions readable.

import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';

test('read_file returns file contents', async ({ mcp }) => {
  const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
  expect(result).toContainToolText('Hello, world');
  expect(result).not.toBeToolError();
});

test('server exposes required tools', async ({ mcp }) => {
  const tools = await mcp.listTools();
  expect(tools.map((t) => t.name)).toContain('read_file');
});

Available matchers:

Matcher	Description
`toMatchToolResponse`	Response exactly matches expected value (deep equal)
`toContainToolText`	Response contains expected substrings
`toMatchToolSchema`	Response validates against a Zod schema
`toMatchToolPattern`	Response matches a regex pattern
`toMatchToolSnapshot`	Response matches a saved baseline
`toBeToolError`	Response is (or is not) an error
`toHaveToolResponseSize`	Response size is within bounds
`toSatisfyToolPredicate`	Response satisfies a custom function
`toHaveToolCalls`	LLM called the expected tools
`toHaveToolCallCount`	LLM made N tool calls
`toPassToolJudge`	LLM evaluates response quality against a rubric

Eval Datasets

Eval datasets let you define test cases as JSON files and run them with runEvalDataset(). Each case specifies a tool call and one or more assertions.

{
  "name": "file-ops",
  "cases": [
    {
      "id": "read-config",
      "toolName": "read_file",
      "args": { "path": "/tmp/config.json" },
      "expect": {
        "schema": "file-content",
        "containsText": ["version", "name"]
      }
    },
    {
      "id": "read-readme",
      "toolName": "read_file",
      "args": { "path": "/tmp/README.md" },
      "expect": {
        "snapshot": "readme-snapshot"
      }
    }
  ]
}

import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
import { loadEvalDataset, runEvalDataset } from '@gleanwork/mcp-server-tester';
import { z } from 'zod';

test('file operations eval', async ({ mcp }, testInfo) => {
  const dataset = await loadEvalDataset('./data/evals.json', {
    schemas: { 'file-content': z.object({ content: z.string() }) },
  });
  const result = await runEvalDataset({ dataset }, { mcp, testInfo });
  expect(result.passed).toBe(result.total);
});

Supported assertion types:

Type	Description
`containsText`	Response includes expected substrings
`schema`	Response validates against a Zod schema
`regex`	Response matches a pattern
`snapshot`	Response matches a saved baseline
`judge`	LLM evaluates response quality against a rubric
`toolsTriggered`	LLM called the expected tools (LLM host mode)

LLM host mode

{
  "id": "find-config",
  "mode": "mcp_host",
  "scenario": "Find the application config file and return its contents",
  "mcpHostConfig": {
    "provider": "anthropic",
    "model": "claude-opus-4-20250514"
  },
  "expect": {
    "toolsTriggered": {
      "calls": [{ "name": "read_file", "required": true }]
    }
  }
}

Installation

Requires Node.js 22+.

npm install --save-dev @gleanwork/mcp-server-tester @playwright/test

The Anthropic SDK is only needed for LLM-as-judge assertions or LLM host mode with the Anthropic provider:

npm install --save-dev @anthropic-ai/sdk

Quick Start

npx mcp-server-tester init

The CLI wizard creates a playwright.config.ts, example tests, and a sample eval dataset configured for your server. See the CLI Guide for all options.

Configuration

Point the framework at your MCP server in playwright.config.ts:

import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
  projects: [
    {
      name: 'my-server',
      use: {
        mcpConfig: {
          transport: 'stdio',
          command: 'node',
          args: ['server.js'],
        },
      },
    },
  ],
});

For HTTP servers, set transport: 'http' and serverUrl. For servers that require OAuth, see the Transports Guide and CLI Guide for authentication setup, including CI/CD token management.

Documentation

Quick Start — detailed setup and configuration
Expectations — all assertion types including snapshot sanitizers
LLM Host Simulation — tool discoverability testing
API Reference
Transports — stdio and HTTP configuration, OAuth
CLI Commands — init, generate, login, token
UI Reporter — interactive web UI for test results
Development — contributing and building
Migration Guide (v0.12 → v1.0) — upgrading from pre-1.0 releases

AI Skills

Install AI skills to help your coding assistant generate tests, eval datasets, and MCP host evals:

npx skills add -g gleanwork/mcp-server-tester

This installs skills globally so they're available across all your projects. Four skills are included:

Skill	Description
`mcp-tester-guide`	Framework reference — matchers, config, auth, anti-patterns
`write-mcp-test`	Generate direct-mode Playwright tests
`write-mcp-eval`	Generate data-driven eval datasets
`write-mcp-host-eval`	Generate LLM host simulation evals

Compatible with Claude Code, Cursor, Windsurf, Copilot, and 40+ other AI agents.

Examples

The examples/ directory contains complete working examples:

filesystem-server/ — Test suite for Anthropic's Filesystem MCP server: 5 Playwright tests, 11 eval dataset cases, Zod schema validation.
sqlite-server/ — Test suite for a SQLite MCP server: 11 Playwright tests, 14 eval dataset cases.
basic-playwright-usage/ — Minimal Playwright patterns.

Known Limitations

These MCP protocol features are not currently supported. These are deliberate scope decisions, not bugs:

MCP resources (listResources, readResource)
MCP prompts (listPrompts, getPrompt)
Server-to-client notifications
Streaming tool responses (callTool waits for the complete response)

If any of these affect your use case, please open an issue.

License

MIT

from github.com/gleanwork/mcp-server-tester

Install Tester in Claude Desktop, Claude Code & Cursor

Run in your terminal:

claude mcp add tester -- npx -y @gleanwork/mcp-server-tester

FAQ

Is Tester MCP free?

Yes, Tester MCP is free — one-click install via Unyly at no cost.

Does Tester need an API key?

No, Tester runs without API keys or environment variables.

Is Tester hosted or self-hosted?

Self-hosted: the server runs locally on your machine via the install command above.

How do I install Tester in Claude Desktop, Claude Code or Cursor?

Open Tester on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.

Related MCPs

Playwright

Browser automation, scraping, screenshots

by Microsoft

4.915.2K

Puppeteer

Browser automation and web scraping.

by modelcontextprotocol

opentabs-dev/opentabs

Plugin-based MCP server + Chrome extension that gives AI agents access to web applications through the user's authenticated browser session. 100+ plugins with a

by opentabs-dev

robhunter/agentdeals

1,500+ developer infrastructure deals, free tiers, and startup programs across 54 categories. Search deals, compare vendors, plan stacks, and track pricing chan

by robhunter

Compare Tester with

TestervsPlaywright TestervsPuppeteer Testervsopentabs-dev/opentabs Testervsrobhunter/agentdeals

Not sure what to pick?

Find your stack in 60 seconds

Author?

Embed badge for your README

Browse similar

All browse MCPs

Command Palette

Tester

About

README

Playwright Tests

Eval Datasets

LLM host mode

Installation

Quick Start

Configuration

Documentation

AI Skills

Examples

Known Limitations

License

Install Tester in Claude Desktop, Claude Code & Cursor

FAQ

Is Tester MCP free?

Does Tester need an API key?

Is Tester hosted or self-hosted?

How do I install Tester in Claude Desktop, Claude Code or Cursor?

Related MCPs

Playwright

Puppeteer

opentabs-dev/opentabs

robhunter/agentdeals

Compare Tester with

Tester

About

README

Playwright Tests

Eval Datasets

LLM host mode

Installation

Quick Start

Configuration

Documentation

AI Skills

Examples

Known Limitations

License

Install Tester in Claude Desktop, Claude Code & Cursor

FAQ

Is Tester MCP free?

Does Tester need an API key?

Is Tester hosted or self-hosted?

How do I install Tester in Claude Desktop, Claude Code or Cursor?

Related MCPs

Playwright

Puppeteer

opentabs-dev/opentabs

robhunter/agentdeals

Compare Tester with