loading…
Search for a command to run...
loading…
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also supports monitoring and simulation of ollama/openai interface.
Through this proxy service, we can easily record the parameters and return results of the interaction with the big model, so as to conveniently analyze the logic of the client calling the big model and deeply understand the phenomenon and its essence. This project is not for optimizing the big model, but it can help you uncover the mystery of the big model, understand and achieve product market fit (PMF).
MCP is also an important part of LLM, so this project can also be used as an mcp client and supports detection of sse/mcp-streamable-http mode.
Before the arrival of true AGI, we will have to go through a long journey, during which we will have to face constant challenges. Whether ordinary people or professionals, their lives will be changed.
However, for the use of large models, both ordinary users and developers often indirectly contact them through various clients. But the client often blocks the process of interacting with the large model, and can directly give results based on the user's simple input, giving people a feeling that the large model is mysterious, like a black box. In fact, this is not the case. When using a large model, we simply understand that we are calling an interface with input and output. It should be noted that although many inference platforms provide OpenAI format interfaces, their actual support varies. Simply put, the request parameters and return parameters of the API are not exactly the same.
For detailed parameter support, please see
Please check for other platforms
# clone git
git clone https://github.com/xuzexin-hz/llm-analysis-assistant.git
cd llm-analysis-assistant
# Install the extension
uv sync
Enter the root directory, then the bin directory Click run-server.cmd to start the service Click run-build.cmd to package the service into an executable file (in the dist directory) Or run the following command directly in the root directory:
#Default port 8000
python server.py
#You can also specify the port
python server.py --port=8001
#You can also specify the openai address, the default is the ollama address: http://127.0.0.1:11434/v1/
python server.py --base_url=https://api.openai.com
#If you configure other api addresses, remember to fill in the correct api_key, ollama does not need api_key by default
#--is_mock=true Turn on mock and return mock data
python server.py --is_mock=true
#--mock_string, you can customize the returned mock data, if you do not set this item, the default mock data will be returned. This parameter also applies to non-streaming output
python server.py --is_mock=true --mock_string=Hello
#--mock_count, the number of times the mock returns data when streaming output, the default is 3 times
python server.py --is_mock=true --mock_string=Hello --mock_count=10
#--single_word, mock streaming output return effect, the default is to divide a sentence into 3 parts according to [2:5:3] and return them in sequence, after setting the second parameter, it will be a word-by-word streaming output effect
python server.py --is_mock=true --mock_string=你好啊 --single_word=true
#--looptime, mock streaming output return data interval, the default is 0.35 seconds, set looptime=1 when streaming output display data speed will be slow
python server.py --is_mock=true --mock_string=你好啊 --looptime=1
When using uv no specific installation is needed. We will use uvx to directly run llm-analysis-assistant.
uvx llm_analysis_assistant
Alternatively you can install llm-analysis-assistant via pip:
pip install llm-analysis-assistant
After installation, you can run it as a script using:
python -m llm_analysis_assistant
http://127.0.0.1:8000/logs View logs in real time
The implementation logic of mcp client technology is as follows. The interface log seems to be a sequential request, but it is not actually a simple request-response mode. This is easier for users to understand

mcp-sse logic details (for similarities and differences with stdio/streamableHttp, please refer to other materials)

Open the following address in the browser. In the command line, ++user=xxx means that the system variable is user and the value is xxx
http://127.0.0.1:8000/mcp?url=stdio
Or use Cherry Studio to add the stdio service

Open the following address in the browser, the url is the sse service address
http://127.0.0.1:8000/mcp?url=http://127.0.0.1:8001/sse
http://127.0.0.1:8000/mcp?url=http://127.0.0.1:8002/sse?++user=xxx # ++user=xxx in the url means the HTTP request header user value is xxx
Or use Cherry Studio to add the mcp service

Open the following address in the browser, the url is the streamableHttp service address
http://127.0.0.1:8000/mcp?url=http://127.0.0.1:8001/mcp
http://127.0.0.1:8000/mcp?url=http://127.0.0.1:8001/mcp?++user=xxx # ++user=xxx in the url means the HTTP request header user value is xxx
Or use Cherry Studio to add the mcp service

When using Cherry Studio, you can http://127.0.0.1:8000/logs View the logs in real time to analyze the calling logic of sse/mcp-streamable-http
Change the base_url of openai to the address of the service: http://127.0.0.1:8000
pip install langchain langchain-openai
from langchain.chat_models import init_chat_model
model = init_chat_model("qwen2.5-coder:1.5b", model_provider="openai",base_url='http://127.0.0.1:8000',api_key='ollama')
model.invoke("Hello, world!")
####### Agent is a node, agent is a tool, leader mode


####### Every inch has its own strengths and weaknesses (it is said that CodeAct will greatly improve accuracy and efficiency in some scenarios)

Add this to claude_desktop_config.json and restart Claude Desktop.
{
"mcpServers": {
"llm-analysis-assistant": {
"command": "npx",
"args": []
}
}
}Web content fetching and conversion for efficient LLM usage.
Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.
Provides auto-configuration for setting up an MCP server in Spring Boot applications.
A simple, composable framework to build agents using Model Context Protocol by [LastMile AI](https://www.lastmileai.dev)