MCP Overload: Why Your LLM Agent Doesn't Need 20 Tools
MCP — the Model Context Protocol — promised a lot. A unified standard for tool use across AI agents. A way to wire up everything: web search, database access, APIs, file systems, you name it.
Just connect your agent to a few dozen tools, and voilà — superintelligence! At least, that was the promise.
If you’re wiring MCP into agents in Claude Desktop, LangChain, or your own framework and find yourself adding “just one more tool,” this post is for you.
But here’s what many developers are discovering the hard way:
The more tools you give an agent, the worse it performs.
Slower. Dumber. More expensive. In this post, we’ll walk through why MCP overload is a real problem — and what better agent design looks like in the real world.
1. More Tools, More Tokens, More Problems
Each MCP tool comes with a schema — a description of what it does and how it should be used. These schemas are injected into the system prompt. Add 10 or 20 tools, and suddenly your prompt is bloated with several thousand extra tokens.
That means:
- Slower response times
- Increased latency
- Reduced context for your actual task
Add 10 tools with ~300 tokens of schema each and you’ve burned ~3,000 tokens before the user says anything.
As @thdxr pointed out on X:
“It’s funny how many people wrote huge predictions for MCP without checking how LLM performance degrades when you add even 10 tools.”
How much context do tools crowd out?
If each tool definition averages ~250 input tokens:
| Tools | Overhead tokens | Share of 8k window | Share of 128k window | Share of 200k window |
|---|---|---|---|---|
| 5 | ~1,250 | ~16% | ~1.0% | ~0.6% |
| 10 | ~2,500 | ~31% | ~2.0% | ~1.3% |
| 20 | ~5,000 | ~62% | ~3.9% | ~2.5% |
Why it matters even on large windows:
- Tool text occupies the most “privileged” space (system prompt), competing directly with instructions, policies, and task framing.
- Bigger up‑front scaffolding reduces how much code, logs, or examples you can include before the model starts dropping useful context during long chains.
- With MCP, these definitions are typically present every turn; with alternatives like Claude Code Skills, most detail is loaded only when a specific skill is active.
If your prompts include long code diffs, stack traces, or policy text, 2–5k tokens of tools can be the difference between “the model saw it” and “the model didn’t.”
2. Your Agent Becomes Dumber, Not Smarter
Too many tools leads to cognitive overload — not for you, but for the model.
Instead of focusing on the user, the model is juggling a dozen nearly identical options:
"Should I use edit_tool_v1, edit_tool_v2, or replace_line_with_regex?"
One engineer watched their agent try 18 different edit-related tools before finally giving up and writing a shell command:
sed -i 's/foo/bar/g' file.txt
If that’s the fallback, why not just do it first?
3. Your Bills Start Creeping Up
All that tool overhead? You’re paying for it — literally.
Longer prompts, more calls, and more steps all mean higher API costs. One developer described their experiment with a large MCP agent like this:
“It worked… but the performance/price cost was crazy.”
You’re not just slowing down. You’re spending more for worse results.
4. Bigger Prompts, Weaker Alignment
There’s another cost: accuracy.
The more tools you cram into the system prompt, the harder it is to steer the model toward the actual task. The LLM starts focusing on tool selection logic instead of the user’s intent.
“People who don’t build with AI don’t realize how hard it is to steer the model when your system prompt is 90% tool definitions.” — @prashant_hq
If your agent stops following directions, it might not be the model’s fault. It’s your toolset’s.
5. This Isn’t How Developers Use Tools
When you want to rename a file, you don’t open 5 apps or click through a menu of tools.
You type: mv file1.txt file2.txt.
LLMs can do the same — if you let them.
Some engineers are ditching heavy MCP setups entirely and just telling their models to use bash or Python:
import os os.rename("file1.txt", "file2.txt")
It’s faster, simpler, and far more natural for both developers and LLMs.
6. So What Should You Do Instead?
Here’s what experienced builders are doing:
- Give the model 1–5 well-scoped tools
- Let it use general-purpose interfaces (e.g., shell, Python eval)
- Use dynamic tool loading based on the current task
- Avoid overlapping or redundant tool functions
- Keep your system prompt small and focused
“Moving from MCPs to libraries and giving the LLM a simple
eval()tool solves so many of these issues.” — @ProgramWithAI
Let the model write code instead of playing multiple-choice.
7. Agents Should Be Reliable, Not Overengineered
At the end of the day, what matters isn’t how many tools your agent could use. It’s whether it actually works — under real-world input, with real-world constraints.
That’s why prompt testing platforms like PromptForward exist: not to give you more tools, but to make sure the few tools and prompts you do have actually work — on real datasets, not just vibes.
You’re not trying to win a benchmark. You’re trying not to break production.
If you want to see what this looks like in practice — with tiny, composable tools that don’t blow up your context —
Next up: Claude Code: Skills Beats MCP (One Tiny File at a Time).
TL;DR
MCP is powerful — but with great power comes great token bloat.
If you’re building LLM agents, keep your tools focused. Start small. Favor code execution or shell access over schema overload. And test your prompt behavior before your users do.
In the end, an agent with 4 good tools will outperform one with 40 mediocre ones — every time.
8. MCP vs Claude Code Skills: Context footprint
Claude Code’s Skills take a different approach: keep the entry point tiny, load depth only when needed.
| Topic | MCP tools (typical) | Claude Code Skills (typical) |
|---|---|---|
| Up‑front context | JSON schemas + parameter docs injected into the system prompt every turn | Short SKILL.md frontmatter (name, description) scanned; details loaded on demand |
| Average size at discovery | ~200–800 tokens per tool (varies by schema/examples) | ~20–60 tokens for frontmatter; instructions in SKILL.md read only when the skill is active |
| Redundancy risk | Overlapping tools force the model to choose among many similar options | One skill per job reduces “tool choice” overhead |
| Depth | Encoded in schema/validation; often requires more formal upfront text | Markdown instructions reference deeper files (reference.md, examples.md) only when needed |
| Bash/CLI leverage | Through explicit tools or servers | Native: skills call scripts and send back small snippets, keeping chat context small |
In our Skills post, the sample SKILL.md frontmatter plus a short flow fits comfortably under a few hundred tokens and is only read when invoked — whereas MCP tool definitions consume tokens on every turn whether or not that tool is used. If your work involves long code diffs, logs, or policy text, this difference directly translates to how much “real work” can fit in context.
See “Claude Code: Skills Beats MCP (One Tiny File at a Time)” for concrete examples of file layouts and why progressive disclosure keeps context lean.
Sources & further reading
- Model Context Protocol — modelcontextprotocol.io
- OpenAI tooling guidance — platform.openai.com/docs/guides/tools
- Anthropic tool use guide — docs.anthropic.com/claude/docs/tool-use
- Anthropic Skill Creator — github.com/anthropics/skills