loading…
Search for a command to run...
loading…
Provides a suite of deterministic math tools powered by SymPy to handle algebra, calculus, linear algebra, and statistics via the Model Context Protocol. It ena
Provides a suite of deterministic math tools powered by SymPy to handle algebra, calculus, linear algebra, and statistics via the Model Context Protocol. It enables smaller language models to delegate complex computations to a verified symbolic backend for accurate and reliable results.
Deterministic math tools for small language models.
ReasonForge gives small LLMs (8B–32B) access to a verified SymPy computation backend via tool calling. Instead of relying on the model to compute, all math is delegated to deterministic tools — the model only reasons about what to compute and how to present results.
User Question → LLM (Qwen3) → Tool Calls → SymPy Backend → Verified Results → LLM → Final Answer
Multi-Turn Agentic Loop:
<think> tags to analyze the problem and decide on a strategy.MAX_ROUNDS).| Tool | Operations | Backend |
|---|---|---|
math_tool |
compute, solve, simplify, factor, expand, gcd, lcm, prime_factors, divisors, mod_inverse, nsolve, crt + SymPy builtins (totient, fibonacci, isprime...) | SymPy |
calculus_tool |
differentiate, integrate, limit, series, summation, partial_fraction, trigsimp, ode_solve, laplace | SymPy |
matrix_tool |
determinant, inverse, eigenvalues, eigenvectors, rank, rref, transpose, multiply, add, trace, nullspace, columnspace, charpoly, norm, adjugate, solve (Ax=b) | SymPy |
statistics_tool |
describe, mean, median, mode, std, variance, correlation, regression, percentile, zscore, skewness, kurtosis, geometric_mean, harmonic_mean | Python stdlib |
code_tool |
run, check, ast_inspect — sandboxed Python code execution, syntax checking, and structure analysis | subprocess |
MCP/
├── core.py # Shared LLM request logic, expert definitions, tool schemas
├── experts/
│ ├── math/
│ │ ├── server.py # MCP server entry point (math tools)
│ │ └── tools/
│ │ ├── preprocess.py # Expression parser (^ → **, implicit multiplication)
│ │ ├── algebra.py # algebra + number theory
│ │ ├── calculus.py # derivatives, integrals, ODEs
│ │ ├── matrix.py # linear algebra
│ │ └── statistics.py # descriptive & inferential stats
│ └── code/
│ ├── server.py # MCP server entry point (code execution)
│ └── tools/
│ └── code.py # Sandboxed Python runner & syntax checker
├── tests/
│ ├── sanity.py # Tool unit tests (16 checks)
│ ├── math_benchmark.py # A/B math benchmark (MATH-500 dataset)
│ ├── code_benchmark.py # A/B code benchmark (HumanEval)
│ └── results/ # Local benchmark outputs
├── ui/
│ ├── app.py # Gradio chat interface with intermediate thinking steps
│ └── style.css # Custom UI styles (dark mode, thinking blocks)
├── ReasonForge_Colab.ipynb # One-click Colab deployment notebook
├── pyproject.toml
├── requirements.txt
├── run_tests.bat # Local tests launcher (Windows)
└── run_ui.bat # Local UI launcher (Windows)
# Requires: Ollama running with a supported model (qwen3:8b, qwen3:32b, etc.)
uv sync
uv run python -m ui.app
# Open at http://localhost:7861
RF_ALLOW_REMOTE_ENDPOINTS=1.RF_ENDPOINT_ALLOWLIST.Examples:
export RF_ENDPOINT_ALLOWLIST="localhost,127.0.0.1,::1,api.mycompany.com"
export RF_ALLOW_REMOTE_ENDPOINTS=1
code_tool supports optional Docker isolation with safe fallback:
RF_CODE_TOOL_ISOLATION=auto (default): use Docker if available, else process modeRF_CODE_TOOL_ISOLATION=docker: prefer Docker, fallback to process if unavailableRF_CODE_TOOL_ISOLATION=process: force process modeOptional image override:
export RF_CODE_TOOL_DOCKER_IMAGE=python:3.11-alpine
Open ReasonForge_Colab.ipynb in Google Colab Pro with an A100 GPU.
It clones this repo, installs Ollama + qwen3:32b, and launches the UI with a public Gradio link.
# Math benchmark — MATH-500 (requires Ollama running)
uv run python -m tests.math_benchmark --model llama3.2:3b --n 10
uv run python -m tests.math_benchmark --model qwen3:32b --n 50 --think
# Code benchmark — HumanEval (requires Ollama running)
uv run python -m tests.code_benchmark --model qwen3:8b --n 20
uv run python -m tests.code_benchmark --model qwen3:32b --n 164 --think
uv run python -m tests.sanity
uv run python -m tests.test_all
uv run python -m tests.release_gate
qwen3:8b, 50 problems)| Metric | Baseline | ReasonForge |
|---|---|---|
| Correct | 43/50 | 45/50 |
| Uniform Accuracy | 86.0% | 90.0% (▲ +4.0%) |
| Weighted Score | 144/176 | 154/176 |
| Weighted Accuracy | 81.8% | 87.5% (▲ +5.7%) |
Level 1 5/5 100% ████████████████████
Level 2 7/7 100% ████████████████████
Level 3 8/9 89% █████████████████
Level 4 14/15 93% ██████████████████
Level 5 11/14 79% ███████████████ (+14%)
Algebra 10/12 83% ████████████████
Counting & Probability 4/4 100% ████████████████████
Geometry 4/4 100% ████████████████████
Intermediate Algebra 11/13 85% ████████████████ (+8%)
Number Theory 2/2 100% ████████████████████
Prealgebra 7/7 100% ████████████████████
Precalculus 7/8 88% █████████████████ (+12%)
qwen3:8b, 160 problems)| Metric | Baseline | ReasonForge |
|---|---|---|
| Pass@1 | 4/160 | 102/160 |
| Accuracy | 2.5% | 63.7% (▲ +61.2%) |
Testing the 8-billion parameter qwen3 model reveals exactly why deterministic tool-delegation is crucial for smaller models:
46.3s down to 31.0s), all while squeezing out an extra ~5% in weighted grading accuracy.4/160 (2.5%) of the problems. However, the simple addition of the ReasonForge Python runtime tools allowed the exact same model to safely hypothesize, test, and iteratably structure its code, propelling its accuracy to 102/160 (63.7%)—a gigantic +61.2% improvement with zero fine-tuning required.Run in your terminal:
claude mcp add reasonforge -- npx Yes, ReasonForge MCP is free — one-click install via Unyly at no cost.
No, ReasonForge runs without API keys or environment variables.
Self-hosted: the server runs locally on your machine via the install command above.
Open ReasonForge on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.
Read and write pages in your workspace
by NotionIssues, cycles, triage — from Claude
by LinearSearch and read your Drive files
by GoogleConnect and unify data across various platforms and databases with [MindsDB as a single MCP server](https://docs.mindsdb.com/mcp/overview).
by mindsdbNot sure what to pick?
Find your stack in 60 seconds
Author?
Embed badge for your README
Browse similar
All productivity MCPs