Scaling AI Efficiently with Serverless MCP Tools on AWS Lambda

Model Context Protocol (MCP) is the go-to standard for building sophisticated AI Agent tools. As we touched on in our last post, MCP is all about empowering AI agents with greater capabilities, enabling them to be truly autonomous.

As more and more companies bring AI agents into production, the big questions now are about managing costs, optimising performance, and speed to market.

We engaged with a customer recently with this exact problem statement. This post explores how you can run your MCP-enabled tools on serverless platforms such as AWS Lambda, and why it's a smart move for your organisation.

Why Serverless?

Serverless removes the heavy lifting of scaling and maintenance, letting your teams focus on shipping features faster.

AI agents rely on a diverse set of tools, invoking them based on the query and context. This makes their usage quite unpredictable - some tools will be in constant demand, while others may only be used for specific cases. As your AI systems grow, this variability can impact cost, as well as stability during unpredictable spikes.

This is where a serverless platform like AWS Lambda offer distinct advantages:

Eliminate Infrastructure Waste: Lambda's "scale-to-zero" model means you only pay for compute time consumed. If a tool isn't active, there are no idle servers racking up costs.
Automated, Effortless Scaling: Lambda automatically scales to meet demand as requests for a tool increase, and scales back down when traffic subsides. This elasticity ensures consistent performance without manual intervention or pre-provisioning for peak capacity.
Lower Operational Burden: By abstracting away the underlying server management, your team is freed up to focus on building better tools and delivering business value, not on patching and maintaining infrastructure.

This built-in agility and cost efficiency make serverless an ideal fit for hosting MCP tools.

Key Challenges of Deploying MCP on AWS Lambda

Companies adopting serverless on Lambda report up to 30% cost savings by eliminating idle infrastructure.

Deploying MCP tools to AWS Lambda isn't that straightforward as it could have been. Finding clear guidance is tough, and searches often lead to AWS resources for using serverless MCP tool or lambda MCP tool - these are tools to build serverless applications, rather than hosting MCP tools - I expect AWS to improve on this in the future.

Furthermore, AWS Lambda has some inherent limitations that don't quite align with the needs of MCP tools. We’ve highlighted some of these points in our previous MCP discussion:

Statefulness: Lambda is stateless, which is great for simple functions. However, MCP tools often need to maintain state – think multi-step operations or continuing a conversation where it left off. This is a key hurdle.
Execution Time: Lambda functions have a maximum run time of 15 minutes. For complex, long-running MCP tool operations, this limit can be an issue, potentially breaking a workflow if not handled correctly.
Streaming Responses: MCP often requires streaming responses back to the agent, a capability that Lambda doesn't natively support without some creative workarounds.

So, what's the best approach to hosting MCP on Lambda then?

Overcoming Challenges

The Good news is that things move fast! The new MCP Streamable HTTP transport greatly simplifies MCP communication, and eliminates the need for the server to remember client-specific endpoints that was needed by the older "HTTP+SSE" transport. This allows for “sessionless” connections to work that also makes serverless implementations easier.

Building on this, AWS has introduced a pure Lambda-based Python MCP handler. This handler addresses many of the challenges we just talked about, making it much easier to get MCP tools running smoothly on AWS Lambda.

With the new MCP Lambda handler, deploying AI tools serverlessly is no longer an experiment – it’s production-ready.

How It Works

At a high level, here's what a serverless MCP solution would look like:

AI Agent: will have the API Gateway endpoint configured as a Streamable HTTP MCP tool.
API Gateway: Serves as the secure front door, handling authentication, routing, and traffic management. This also keeps all communications private inside the VPC.
AWS Lambda with MCP Handler: This is the lambda code that executes the tool's logic.
DynamoDB for Session Management: To solve the statefulness problem, we use an external database like DynamoDB to store session information. The MCP session id is used to retrieve and persist the state between Lambda invocations, effectively making the stateless function stateful. Other storage providers such as Redis, Postgres, etc can be plugged in as well.

The tool in the lambda function is quite simple as shown below:

from awslabs.mcp_lambda_handler import MCPLambdaHandler

mcp = MCPLambdaHandler(name="my-tool", version="1.0.0")

@mcp.tool()
def add_two_numbers(a: int, b: int) -> int:
    """Add two numbers together."""
    return a + b

def lambda_handler(event, context):
    """AWS Lambda handler function."""
    return mcp.handle_request(event, context)

This simplicity means your AI capabilities can scale globally, with cost efficiency and without complex infrastructure planning.

What About Streaming?

While the Lambda MCP handler is excellent for most use cases, there's still a point to consider if your tools require advanced streaming responses. Lambda doesn't natively support it. There is a workaround using the Lambda Web Adapter, which lets you run standard fastmcp server on Lambda with streaming support.

However, this adds a lot more complexity to the deployment, involving many moving parts and a heavier runtime compared to the simpler, native Lambda handler. For instance, Cold starts can be as much as 5 seconds impacting end user experience. Another option is to use something like ECS containers to run the MCP servers for streaming support - but then its not serverless!

In the end, it's a trade-off to evaluate based on the specific needs of your tools and the importance of streaming for your use case.

Wrapping Up

AWS has done some great work here, making hosting AI agent tools on serverless platforms a practical and highly attractive option. By leveraging AWS Lambda, you can build an AI infrastructure that is not only powerful and scalable but also remarkably cost-efficient, ultimately driving both technical efficiency and tangible business results.