MCP servers: what happens behind the scenes?

Originally published on medium.

If you are following any news related to LLMs, you surely have seen Model Context Protocol (MCP) getting a lot of attention lately. It also made it to ASSESS in the recently released Tech Radar Vol. 32 by Thoughtworks which will further increase its visibility. In your feeds, MCP will be typically mentioned in conjunction with IDEs via Cursor, Windsurf, or VSCode with Github Copilot Chat by users who get access to new capabilities or extend the context of LLMs using MCPs. Often showcased by non-engineers who achieve results that were difficult for them to achieve without MCPs (such as generating scenes in Blender), their excitement is adding to the hype around MCP. Yet, when one explores the actual codebases of MCP servers, one finds early-stage, incomplete API wrappers that are far away from existing SDKs or libraries available to Software Engineers. Adding the non-determinism coming from LLMs, we’re in for a treat! 🍿

Having tried out the Blender integration via VSCode and Github Copilot which was far too frequently running into rate limits, I was looking for a simple setup to try out MCP servers to understand what happens under the hood. I found openai-agents-python to provide a very simple setup for an experimentation harness against a chosen set of MCP servers. Its default integration with OpenAI’s Traces for observability provides a detailed look on what happens behind the scenes (it also saves me from running any OpenTelemetry stack I used previously).

In this post, I will be trying out three MCP servers calling them through openai-agents-python and using gpt-4o as model (default). The convenience of the framework is that given a configuration of MCP servers, it automatically handles their lifecycle (incl. download) without additional actions from the user.

MCP and LLM

The interplay between the LLM and MCP is quite simple. The LLM gets the user query and passes the list of available MCP functions (aka. tools) with their description and parameter specification to the LLM. The LLM then decides which function is most appropriate to call in the scenario defined by the user and returns as output the function call incl. parameters. The workflow orchestrator (e.g. IDE or agent framework) executes the function call using the MCP protocol and afterwards passes the input and output to the LLM for it to determine the next action.

Trying out mcp-server-git: Git repo access via local filesystem

Starting point is the git_example from openai-agents-python that we will use as basis for future experiments as well. The code sets up an agent with tools from mcp-server-git for accessing any chosen repository from our local file system. The agent requests the most frequent contributor and asks for the summary of the latest change in the repo. After setting the OPENAPI_API_KEY environment variable, we can test it out.

$ export OPENAPI_API_KEY=sk-...
$ uv run python main.py
Please enter the path to the git repository: /tmp/openai-agents-python

----------------------------------------
Running: Who's the most frequent contributor?
The most frequent contributor to the repository is **Rohan Mehta**.
----------------------------------------
Running: Summarize the last change in the repository.
The last change in the repository was made by James Hills on 2025-04-04. The commit hash is `064e25b01b5c82c08aea66ff898ff27adbb013d8`, and the message was: "add links and mcp + voice examples (#438)".

The first answer could be more comprehensive, but at least it worked out of the box. Let’s take a look what happened under the hood:

OpenAI Trace view for the main.py program. Shows MCP Tool list, LLM completion request to generate tool call, the git_log tool call, and LLM completion request to generate the final answer.

We can see the agent fetching the list of MCP tools and calling the LLM that is then converting the user query to a tool call for git_log:

git_log({
  "repo_path": "/tmp/openai-agents-python",
  "max_count": 1000
})

As described before, the MCP server performs the tool call and passes the result again to the LLM. Since the git log contains user and commit data (commit_id, author, message) we end up with 29k tokens passed to the LLM after which we get the final response shown to the user.

Trace with git_log function call with input and output.

mcp-server-git provides also operations to work on commits which are definitely interesting to try out another time.

Trying out github-mcp-server: remote Github access via API

Now, let’s try the official github MCP server, first released on April, 4th 2025. We will use the same queries as in the prior example, but specify the repo names directly in the prompts for the LLM to know what repository we want to query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
async def send(agent: Agent, message: str):
    print("\\n" + "-" * 40)
    print(f"Running: {message}")
    result = await Runner.run(starting_agent=agent, input=message)
    print(result.final_output)

async def run(mcp_server: MCPServer):
    agent = Agent(
        name="Assistant",
        instructions=f"Answer questions about Git repositories.",
        mcp_servers=[mcp_server],
    )
    cont = True
    while(cont):
        message = input("> Input: ")
        # exit upon Exit or CTRL+C
        if message == "Exit":
            cont = False
            break

        await send(agent, message)

async def main():
    async with MCPServerStdio(
        cache_tools_list=True
        params={
            "command": "docker",
            "args": [
                "run",
                "-i",
                "--rm",
                "-e",
                "GITHUB_PERSONAL_ACCESS_TOKEN",
                "ghcr.io/github/github-mcp-server"
            ],
            "env": {
                "GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
            }
        },
    ) as server:
        with trace(workflow_name="MCP Git (Official)"):
            await run(server)

At time of writing, the MCP server provides 29 functions (e.g. issue, commit, pull request operations and search) which wrap the Github APIs called using a Github token defined as an environment variable GITHUB_PERSONAL_ACCESS_TOKEN. For our experiment, in theory, just the function list_commits() for listing commits would be sufficient to answer the posed questions. Let’s see how successful the agent will be with this (simple?) task.

$ export GITHUB_PERSONAL_ACCESS_TOKEN=...
$ uv run python github-mcp-server.py
GitHub MCP Server running on stdio
----------------------------------------

> Input: Who's the most frequent contributor to zalando/skipper?
> Running: Who's the most frequent contributor to zalando/skipper?
Error invoking MCP tool search_users: failed to search users: GET <https://api.github.com/search/users?order=desc&page=1&per_page=1&q=repo%3Azalando%2Fskipper&sort=repositories:> 422 Validation Failed [{Resource:Search Field:q Code:invalid Message:None of the search qualifiers apply to this search type.}]

Looks like we hit a bug: the query constructed by the LLM is invalid (reported as github/github-mcp-server#135). The LLM opted for search_users() instead of using the git commit log like our prior experiment.

Let’s take a look at the second query:

$ uv run python github-mcp-server.py
GitHub MCP Server running on stdio
----------------------------------------

> Input: Summarize the last change in the repository zalando/skipper
> Running: Summarize the last change in the repository zalando/skipper
Error getting response: Error code: 429 - {'error': {'message': 'Request too large for gpt-4o in organization org-XYZ on tokens per min (TPM): Limit 30000, Requested 68487. The input or output tokens must be reduced in order to run successfully. Visit <https://platform.openai.com/account/rate-limits> to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}. (request_id: req_abcabcabc...)

Error message from OpenAI: rate limit of 30000 tokens exceeded, requested 68487 tokens.

Another error. This time we hit a rate limit given that the request to the LLM is too large (68487 tokens vs. 30000 being the limit). It took the agent only 19.40s to realize this… Something must have inflated the response from the MCP server. A closer look at the traces reveals that the tool call to list_commits with perPage = 1 resulted in a long response (containing 30 commits which is the default setting) with a whooping size of 180KB. Bug number two — filed as github/github-mcp-server#136.

A closer look at the MCP server response also shows the excessive payload originating from the Github API response. A single commit object has 5–6 KB and beyond information on the commit it includes goodies such as the PGP signature of the author, hardly needed for the task at hand:

{
  "node_id": "C_kwDOAlA7-NoAKDdlMmNhM2JmZDI2NzNiNTFkYjRhYmNmYmQ1OWRlYTMzYTk0YzIwMzE",
  "sha": "7e2ca3bfd2673b51db4abcfbd59dea33a94c2031",
  "commit": {
    "author": {
      "date": "2025-04-02T09:43:35Z",
      "name": "... ...",
      "email": "..."
    },
    "committer": {
      "date": "2025-04-02T09:43:35Z",
      "name": "GitHub",
      "email": "noreply@github.com"
    },
    "message": "Add context to log entry (#3466)\\n\\nThis change enriches log entry with request context to be used by custom log formatter.\\n\\nSigned-off-by: ... ... <....>",
    "tree": {
      "sha": "cff0e8e12633313bff1291df86f65f3f81acfd66"
    },
    "url": "<https://api.github.com/repos/zalando/skipper/git/commits/7e2ca3bfd2673b51db4abcfbd59dea33a94c2031>",
    "verification": {
      "verified": true,
      "reason": "valid",
      "signature": "-----BEGIN PGP SIGNATURE-----\\n\\n[...]\\n-----END PGP SIGNATURE-----\\n",
      "payload": "tree cff0e8e12633313bff1291df86f65f3f81acfd66\\nparent 985da0b03d2fa499e4868a22d85426ed387b4a98\\nauthor ... ... <...> 1743587015 +0200\\ncommitter GitHub <noreply@github.com> 1743587015 +0200\\n\\nAdd context to log entry (#3466)\\n\\nThis change enriches log entry with request context to be used by custom log formatter.\\n\\nSigned-off-by: ... ... <...>"
    },
    "comment_count": 0
  },
  "author": {
    "login": "...",
    [...]
    "subscriptions_url": "<https://api.github.com/users/.../subscriptions>"
  },
  [...]
}

BTW. If we adjust the query for our most frequent committer to Who's the most frequent contributor to zalando/skipper? Use list_commits tool to answer this question. we will run of course into the same error. However, fetching 30 commits is hardly close to the right answer.

Trying out aws-documentation-mcp: search for public AWS docs

Let’s take a look at another freshly released MCP servers — this time from AWS. I choose the simplest one to run: aws-documentation-mcp provides access to public AWS docs and does not require any AWS account access.

A closer look at the server code shows the magic behind this server. It’s running a search against the AWS documentation search endpoint (https://proxy.search.docs.aws.amazon.com/search) with a dedicated user agent [...] ModelContextProtocol/1.0 (AWS Documentation Server) allowing AWS to track the MCP server use (and likely rate limit usage). The system prompt also mentions a recommendation tool to use for retrieving related content. The MCP can perform a search using the user’s input and next use search results to fetch the content, parse it using beautifulsoup4 and convert into markdown using markdownify.

So far so good. Let’s try it out with two questions about S3:

“What are the file size limits for AWS S3?”
“How does one enable S3 transfer acceleration?”

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
async def run(mcp_server: MCPServer):
    agent = Agent(
        name="Assistant",
        instructions=f"Answer questions about AWS services.",
        mcp_servers=[mcp_server],
    )

    await send(agent, "What are the file size limits for AWS S3?")
    await send(agent, "How does one enable S3 transfer acceleration?")

async def main():
    async with MCPServerStdio(
        cache_tools_list=True,  # Cache the tools list, for demonstration
        params={"command": "uvx", "args": ["awslabs.aws-documentation-mcp-server@latest"]},
    ) as server:
        with trace(workflow_name="MCP AWS Documentation Server"):
            await run(server)

The task took about 24 seconds, but it looks like a success:

$ uv run python aws-docs.py

----------------------------------------
> Running: What are the file size limits for AWS S3?
Here are the file size limits for uploading objects to Amazon S3:

- **Console Upload**: Up to 160 GB per file using the Amazon S3 console.
- **Single PUT Operation**: Up to 5 GB per file using AWS SDKs, REST API, or AWS CLI.
- **Multipart Upload**: Up to 5 TB per file using AWS SDKs, REST API, or AWS CLI. This option is for larger files, allowing uploads in parts ranging from 5 MB to 5 TB.

You can use multipart uploads for efficient handling of larger objects.

----------------------------------------
> Running: How does one enable S3 transfer acceleration?
To enable S3 Transfer Acceleration, you have several options including using the AWS Management Console, AWS CLI, or API. Here's a concise guide:

### Using the AWS Management Console

1. **Sign In**: Log into the AWS Management Console and open the S3 interface.
2. **Select Bucket**: In the left navigation pane, choose **General purpose buckets**. Then, select the bucket you want to enable transfer acceleration for.
3. **Access Properties**: Click on **Properties**.
4. **Edit Transfer Acceleration**: Under **Transfer acceleration**, click **Edit**.
5. **Enable**: Choose **Enable**, then save changes.

### Using the AWS CLI

To enable Transfer Acceleration using AWS CLI, run the following command:

```bash
aws s3api put-bucket-accelerate-configuration --bucket [bucket-name] --accelerate-configuration Status=Enabled
```

### Using the Accelerated Endpoint

Once enabled, you can use the accelerated endpoint for faster data transfers:

- Find the **Accelerated endpoint** under the bucket's **Properties** tab.
- Use `s3-accelerate.amazonaws.com` to direct requests through the accelerated endpoint.

For more detailed instructions and examples of using the AWS CLI and SDKs, visit the [Amazon S3 Transfer Acceleration documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration-examples.html).

The traces reveal what happened behind the scenes. For each of the inputs, we get search_documentation followed by read_documentation.

Trace for MCP AWS Documentation Server: two tool calls and two LLM calls.

This time, the limits are correctly respected and by default, the search gets the top5 results.

Trace for a search_documentation tool call with limit of 5 and ranked results with rank_order, url, and title.

Also, we get valid (though simplified) answers straight from the docs. The queries took 14.88s and 10.61s respectively, which is on the slow end, yet if the MCP is embedded into the IDE, likely convenient to access without resorting to using the browser.

Closing words: to MCP to not MCP?

What did we learn from this exploration? MCP servers are easy to set up and are a promising tool to equip agentic flows and assistants with access to additional information and capabilities. They also provide us with a way to bring local data as context for LLMs, especially in cases where indexing this data would be impractical. We also learned that the generated LLM calls are rather costly to execute due to excessive context size (if they get executed at all). Before using MCP servers, it’s important to verify the cost footprint for expected tasks as well as add monitoring and spend limits accordingly.

Building a good MCP server is far from easy: getting LLMs to generate the right and syntactically correct function calls is difficult and becomes more complex the more functions are available to choose from. We saw in the AWS example how detailed instructions help guide the model to generate a multi-step call flow required for the job. Without these instructions, we just can’t magically expect good results. It is easier to ask models to generate code that includes API calls instead.

In the git examples, to calculate the most frequent committer, one would expect an iteration over the commit list over at last a few pages. Yet currently, due to excessive size of the response, we cannot even parse 30 commits due to limits in the model’s context size (or model rate limits). This makes the approach rather impractical. A simple prompt: Generate python code to interact with the Github API to determine the most frequent committer for a chosen repository. results in a code snippet that does the job in a more predictable way. Similarly, Generate git command to calculate the most frequent committer for a chosen repository. returns git shortlog -sne that paired with | head -n 1 returns the result for our local git repo. Needless to say how cheap these LLM calls are when compared to the MCP approach. Let’s see how the mentioned git MCP servers will evolve and for how long it will stay around. They’re early on their journey and require more work to be really useful and give users confidence when and how they will actually work. Field filtering and evals come to mind as first extensions that will be a major step change in quality.

I will certainly continue my experiments as the level of hype is too high for MCPs to disappear quickly.

MCP and LLM#

Trying out mcp-server-git: Git repo access via local filesystem#

Trying out github-mcp-server: remote Github access via API#

Trying out aws-documentation-mcp: search for public AWS docs#

Closing words: to MCP to not MCP?#

MCP and LLM

Trying out mcp-server-git: Git repo access via local filesystem

Trying out github-mcp-server: remote Github access via API

Trying out aws-documentation-mcp: search for public AWS docs

Closing words: to MCP to not MCP?