Build a Cloud-Native Multiprocessing Framework | by Dylan Cunningham | Jul, 2022

How to convert a local multiprocessing framework to the cloud

Photo by Luca Bravo on Unsplash

Cloud technology continues to boggle my mind. The power, the ease, and the cost are all favorable — workloads complete 100 times faster, web interfaces and programmatic interaction provide means for the novice and the expert, and free tiers, training, and examples flatten up-front costs and learning curves. There are many cloud platforms, Amazon Web Services (AWS) is likely the best place to start.

  1. Review
  2. Background
  3. The Two Services
  4. Getting Started with AWS
  5. Setup SQS
  6. Setup Lambda
  7. Summation

In this article, we are going to walk-through how to create a framework for multiprocessing in the cloud. I wish to extend the content from a previous article I wrote where I discuss multiprocessing on a PC or laptop. You may find reading that article first gives you added context, it is called A Simple Multiprocessing Framework Within Python.

AWS provides two services when used in tandem act as a multiprocessing engine which can be fully automated — and if your jobs are small enough, then you won’t pay a dime because of their free tier.

Multiprocessing on a PC

Let’s say your laptop or PC has, at best, 4 cores and 8 logical processors. This would allow you (if you don’t do anything else on your computer at the same time) to run 8 tasks simultaneously. You are therefore limited to logical processors. Not to mention, you may also run into memory (RAM) constraints based on what you are trying to achieve. Let’s assume your PC also has 8 GB of memory.

Multiprocessing in the Cloud

Regarding “processors” and RAM in the cloud, you are limited to 1K “processors” running simultaneously and 10 GB of memory per “processor” — that is 10K GB. You cannot match that anywhere.

Therefore, the cloud version of multiprocessing is favorable from a speed point of view, because if you had to process 100K unique tasks and each takes two minutes to complete, then the PC process would take ~17 days to complete and the cloud process would take , at most, three hours to complete.

A Queue

Have you heard of a queue service? If not, think of a queue service like a card dealer in a casino. The dealer receives a new deck of cards to start the game. S/he deals a predefined subset of cards to each player. The players do what they wish with the cards to increase their likelihood of success, as predefined by the rules. Once all the necessary cards are dealt and all the players have played their hand, the players will show their hand to the dealer, finally, s/he determines who won and lost.

A queue service works similarly to our analogy. The queue service sits idle until messages are received. The queue service then passes messages over to a “processor,” just like the dealer passes out the deck of cards. Based on predefined logical steps, the processors do only what they are allowed to do with the message(s) they receive, then they send information back to the queue to know if it was successful. If successful, great. If it was a failure, they may be other steps to retry or alert a user.

The first service we will discuss is a queue service called Simple Queue Service (SQS for short)

A Lambda

A processor, from a PCs point of view, is a piece of hardware designed to run a series of code defined by the application or programming language sending start, run, and stop signals to the hardware. A processor is a service or technology that simply processes data/information as defined by a user: you and me.

In the AWS cloud world, processors are called Lambda’s. They do just what a PCs processor is designed to do: run a series of code defined by the application or programming language.

The second service we will discuss is a processor service within AWS called Lambda.

A Visual

Visuals help me, likely you too. Here’s a flow diagram to show what we will work on going forward (we will discuss these components in bite sizes):

📸 by Me
📸 by Me

You will need an AWS account. Follow these steps:

  1. Open Chrome, or some other web browser and type in this URL: https://aws.amazon.com/console/.
  2. Click “Sign In to the Console” (top right).
  3. Click “Create a new AWS account.”
  4. Complete all necessary steps from their to sign up, verify email, all of that.

Run into snags, or were those steps sufficient? Let me know in the comments.

There are multiple ways to browse to the SQS setup page. Here’s what mine looks like:

📸 by Me

Do you not see the Simple Queue Service tab, button, or link? Click one, when you do. You should see an option to “Create queue,” click that button. Now, follow these steps:

  1. Under Details and Type: Leave the “Standard” option selected (we won’t worry about the FIFO option).
  2. Under Details and Name: Name your queue something unique which relates to the unique messages it’s going to send to the Lambdas (ie, the “processors”).
  3. Under Configuration and Visibility timeout: Change options to 15 Minutes. (For simplicity, set this to the max time you want your Lambdas to run. Max time for a Lambda is 15 minutes, FYI.)
  4. Under Configuration and Message retention period: Change options to 1 Hour. (This tells the queue to retain the message for a set amount of time; you will need time for all your messages to get processed.)
  5. Scroll to the bottom of the page and click “Create queue”.

How to send a message to SQS

Let’s assume you wish to send a message to SQS from your laptop or PC.

First, import the Python library boto3.

import boto3

Next, I always have a settings.json file, which I import as a dictionary in Python. It looks something like this:

{
"local": true,
"account": "123456789012",
"role": "developer"
}

The local key tells my process I am running from my machine, if I ever want to convert this local process to AWS, then I am already prepared to do so. The account key is found in the top right corner of your AWS page. The role is another animal which could be an article on its own: here’s one for reference. (If you like how I explain things and want an article to walk through that process, then let me know in the comments.)

📸 by Me

Now, we need a way to authenticate locally (to show AWS we ought to be sending messages to our queue. Here’s a function to help with that.

FYI, AWS has a feature where you can create resources in a given region. In the image just above, you’ll see the word Ohio. Ohio is the us-east-2 region, as shown in the code below (the little arrow helps you know that). Adjust as necessary for your case.

Now we need functionality to send messages to our queue. Here are two which work hand-in-hand:

The send_message_to_sqs function above is used by the create_messages_and_send_to_sqs function below. You can also see the send_message_to_sq sfunction above uses the get_resource function we discussed moments ago.

This function sends messages in batches of 10 or less. I use this function in my process to query stock ticker data from platforms like Public, Web, or Robinhood; However, I use Tiingo for all my stock price/fundamental data (end of day and intraday). Therefore, this function takes a list of tickers, hence why you see that as a parameter to the function. Change as necessary for your process, but the list is a list of dictionaries, in my case. Each dictionary contains the core information SQS will send to Lambda in order for the Lambda to process the ticker data (query from Tiingo and save to another AWS service called Simple Storage Service, or S3 for short).

Lastly, your final step is to use the create_messages_and_send_to_sqs function, as explained, to send your messages to your queue.

Unless you connect your Lambda function to SQS, nothing will happen. Keep reading to learn about how you can access Part II of this article to learn how to configure your Lambda.

Getting your AWS Lambda ready for deployment can take a lot of time, it can also be quick. Depending on your desire for using containers, CDK, CI/CD and other auto deployment methods, you will need to explore further. In this article, I will show you how to run through the important pieces via AWS’s online portal/GUI.

There are multiple ways to browse to the Lambda setup page. Here’s what mine looks like:

📸 by Me

Do you not see the Lambda tab, button, or link? Click one, when you do. You should see an option to “Create function,” click that button. Now, follow these steps:

  1. Keep the Author from scratch option chosen (we won’t worry about the other options).
  2. Under Basic information: Name your function.
  3. Under Basic information: Select the Runtime dropdown and choose Python 3.7.
  4. Scroll to the bottom of the page and click “Create function”.
  5. On the next window, scroll all the way to the bottom, to the Layers section, and click “Add a layer”
  6. Under Choose a layer: Select Specify an ARN.
  7. Under choose a layer: Paste this ARN in the text box: arn:aws:lambda:us-east-2:113088814899:layer:Klayers-python37-pandas:22.
  8. Scroll to the bottom of the page and click “Add”.

How to connect SQS to Lambda

In order for your lambda to receive and process messages, you must add a trigger by selecting “+ Add trigger” under the Function overview section. Once you do follow these steps:

  1. Under Trigger configuration: Select the dropdown, type in SQS, then click “SQS”.
  2. After more options appear, under SQS queue: Select the queue name you created earlier.
  3. Under Batch size: Type the number one. (We will assume each lambda will only processes one message. You could increase if desired.)
  4. Scroll to the bottom of the page and click “Add”.

How to configure your Lambda

I only adjust two settings within the Configuration tab of my lambdas: General configuration and Environment variables. Under General configuration, I edit the memory size and timeout. For both, I overestimate my lambdas needs, only a little. Use your judgment here. Under Environment variables, I usually only add API keys, etc. Use your judgment here.

How to setup your Code

When your lambda executes after receiving an SQS message, it will run a lambda_handler method. Make sure you have this method defined. Here’s mine for a lambda I called worker_lambda_daily_eod_historical. This function iterates through messages from SQS — as such, it iterates through tickers — retrieves data from Tiingo’s API for each ticker, writes that data to AWS’s Simple Storage Solution (S3), and, finally, returns a 200 or 400 as a response.

The only other method within my function you should see is how I save data back to S3. See below:

Leave a Comment