Building a Massively Concurrent SMS Infrastructure on AWS: A Serverless Architecture Pattern Breakdown

Kevin Mesiab
Stackademic
Published in
6 min readJan 13, 2024

--

In a recent project, I needed to build an SMS system that could handle extremely high volumes of inbound and outbound web requests, as well as large unpredictable bursts in traffic. In this article, I’ll outline how I achieved this, using Golang and AWS.

A simple diagram illustrating a basic actor pattern for handling SMS conversations
An overly simplified flow chart demonstrating a serverless environment

The Case Study

Handling SMS messages, like any other HTTP interaction, follows a typical request/response lifecycle. However, this process inherently includes a higher level of latency. As a result, the traditional approach suffers.

Consider this example:

In a standard process, a client’s message hits an API Gateway, proceeds to a Controller, passes through business logic, data processing, and response generation stages. Then, the system delivers the response to the client and disconnects.

A serial processing pipeline

This sequence of steps, while necessary, contribute to the time it takes for a message to be processed and replied to. In this setup, the client is awaiting our response for the entire lifecycle. Our compute environment is also active and chomping up dollars. But does every step really need the former to complete? Let’s decompose our application.

Doing Better

We start by decoupling as much as possible.

Starting at the beginning, when a client hits our endpoint, what does it need from us? In our case, it only needs to know that we received the POST. So why do we perform all sorts of computation unrelated to this singular stateless POST while the client is hanging on the line?

Here is where our first major decoupling happens.

Knowing our client only needs to know we received their valid message (not that we handled it correctly), we need to respond as quickly as possible. Instead of blocking the client while we process the message, we introduce an SQS queue.

The SQS queue provides the internal pipeline that we will build our scalability around.

Once we receive the input from the client, we package the data according to our business domain (that usually means unserializing a payload and fiddling with it some), place it on the SQS queue, then immediately respond to the client w/ a HTTP Status 200 OK.

Flow chart showing a serial process broken into a parallel one

Given that the task in question is both singular in nature and minimal in effort, necessitating less than a second for execution, and considering our objective to streamline the duration of our process, it makes sense to use an AWS Lambda.

This is our second largest scalability win. Our compute environment becomes ephemeral, and that gives us rapid horizontal scalability with no extra effort.

AWS diagram demonstrating Lambda Scaling

The Logical Split

Just receiving data from a client and responding with lightning speed doesn’t make for a well architected app. We need to do something useful with that data, and in most cases that means some sort of database operation, maybe some calls w/ a REST client.

After that, our system has one more big job, which is responding to the inbound SMS, with an outbound SMS. But in our case, the outbound SMS doesn’t actually need any output from the database and web operations. So we shouldn’t block it just because we need to process and store some data.

To achieve this level of concurrency, we will modify our initial design, and add a second SQS queue.

We now have two distinct pipelines, one for data processing and storage, and one for responding to SMS messages.

Strong Moves Slow

We’ve accomplished a few things here. First, we’ve unblocked the outbound SMS messages, and we’ve given our data processing a relief valve.

What I mean by relief valve, is this portion of our app’s lifecycle is almost always the slowest. In our case, we have to do some text processing, fetch some data from a few external sources, do some validation, then store that data in two locations. That’s a lot of work! And it takes time. It’s also something that can’t fail. As a result, there needs to be facilities for things like retries and backoffs, etc.

Since the work of processing this data is decoupled from the request/response lifecycle of the SMS, we can let it be slow and not worry it will degrade the experience.

This allows us to build in reliability from the start, without competing with performance.

The Last Leg

As we’ve seen, we’ve completely freed up the process required to send a response SMS. While our data processing SQS queue is busy firing off data processing lambdas, our outbound SMS Queue is firing up our SMS sender lambdas!

The sender lambda’s job is straightforward. It makes a call to an internal API, then packages up a JSON payload and fires it off to a telephony service’s API. — There’s nothing fancy happening here, it’s simply the last step in the app’s lifecycle. But here’s what I want to highlight:

The moment the SMS Receiver lambda placed the inbound message on the SQS queues, what was once a long pipeline of processing, was split in half and became asynchronous.

Independent Scaling

As a result of being logically split apart, each process can run on an entirely separate compute environment which need only exist for the life of the computation. Even better, they can be spawned in response to the SQS queue without us having to intervene. This creates an immediate robust auto scaling solution. The scaling is independent, meaning if the outbound SMS sender gets backed up, that lambda can be scaled up, while the other lambdas remain unchanged.

The value of this sort of self balancing workload pattern is self evident.

Fault Tolerant

By breaking our system apart like this, each piece has a super simple job. It’s hard to mess it up, and when it does, it’s easy to predict how it will. That means we can write very reliable fault tolerant systems! There’s nothing better than sleeping soundly knowing your application can handle bursts of tens of thousands of inbound messages overnight.

Conclusion

Wrapping up, we’ve seen how building a high-capacity SMS system on AWS using serverless tech can really change the game when it comes to handling loads of messages without breaking a sweat. The key moves? Using AWS Lambda, SQS queues, and smartly splitting up the work. This setup doesn’t just handle the traffic well; it’s also tough as nails and smart with resources.

By breaking the tasks into their own little worlds that can grow or shrink on their own, the system can effortlessly juggle thousands of messages, making it a powerhouse solution for any traffic scenario.

For all the bright eyed developers out there looking to create systems that can take a punch and keep on ticking, this is the kind of approach that makes a world of difference.

Bonus Section! Taking It One Step Further

Let’s expand on our solution and eek out even more performance. Let’s consider some of the work that has to be performed when an SMS message is received.

First, we need to be sure the client is valid using some kind of authentication mechanism. Then we need to be sure what they’re sending us is valid. We have to perform all this computation before we can start the real work. If a request fails authentication or validation, we’ve wasted compute resources simply vetting input. Let’s improve on that.

Gateway with Authorizer

Using an AWS API Gateway, integrated with an Authorizer Lambda, we offload authorization token decryption or API key validation, etc, ensuring the Receiver Lambda processes only valid requests.

An intermediary Step Function further validates incoming requests before reaching the Receiver Lambda. It’s job is to simply analyze the payload and either accept or reject the request.

Diagram shows the request filtering using an authorization and validation pipeline

Performing this intermediate validation step separately from our internal workflow, we further distribute the load and prevent our lambdas from having to deal with garbage input or erroneous requests!

Stackademic

Thank you for reading until the end. Before you go:

--

--