<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ AWS SQS - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ AWS SQS - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 25 Jun 2026 04:44:33 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/aws-sqs/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How Message Queues Help Make Distributed Systems More Reliable ]]>
                </title>
                <description>
                    <![CDATA[ Reliable systems consistently perform their intended functions under various conditions while minimizing downtime and failures. As internet users, we tend to take for granted that the systems that we use daily will operate reliably. In this article, ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-message-queues-make-distributed-systems-more-reliable/</link>
                <guid isPermaLink="false">671f94819cd90a859ae00b55</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS SQS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ distributed system ]]>
                    </category>
                
                    <category>
                        <![CDATA[ message queue ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Anant Chowdhary ]]>
                </dc:creator>
                <pubDate>Mon, 28 Oct 2024 13:41:21 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1729895479626/5d476c5d-9749-4c2a-977b-bcdd8b2b8199.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Reliable systems consistently perform their intended functions under various conditions while minimizing downtime and failures.</p>
<p>As internet users, we tend to take for granted that the systems that we use daily will operate reliably. In this article, we’ll explore how message queues enhance flexibility and fault tolerance. We’ll also discuss some challenges that we may face while using them.</p>
<p>After reading through, you’ll know how to implement reliable systems and what key performance factors to keep in mind.</p>
<h3 id="heading-prerequisites"><strong>Prerequisites</strong></h3>
<p>Before diving into this article, you should have a foundational understanding of cloud computing. Here are the key concepts:</p>
<ol>
<li><p>Basic principles of Cloud Computing</p>
</li>
<li><p>Availability in Distributed Systems</p>
</li>
<li><p>An understanding of the CAP theorem.</p>
</li>
</ol>
<h3 id="heading-table-of-contents"><strong>Table of Contents</strong></h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-does-reliability-mean-in-the-context-of-distributed-systems">Reliability in Distributed Systems</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-makes-software-reliable">What Makes Software Reliable?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-message-queue">What is a Message Queue?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-message-queues-help-make-distributed-systems-more-reliable">How Message Queues Help Make Distributed Systems More Reliable</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-challenges-with-message-queues">Challenges with Message Queues</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary">Summary</a></p>
</li>
</ol>
<h2 id="heading-what-does-reliability-mean-in-the-context-of-distributed-systems"><strong>What Does Reliability Mean in the Context of Distributed Systems?</strong></h2>
<p>Reliability, according to the OED, is “the quality of being trustworthy or of performing consistently well”. We can translate this definition to the following in the context of distributed systems:</p>
<ol>
<li><p>The ability of a technological system, device, or component to consistently and dependably perform its intended functions under various conditions over time. For instance, in the context of online banking, reliability refers to the consistent and secure processing of transactions. Users expect to complete transfers and access their accounts without errors or outages.</p>
</li>
<li><p>The system being resilient to unexpected or erroneous interactions by users / other systems interacting with it. For instance, if a user tries to access a deleted file on a cloud storage system, the system can gracefully notify them and suggest alternatives, rather than crashing.</p>
</li>
<li><p>The system performs satisfactorily under its expected conditions of operation, as well as in the case of unexpected load and/or disruptions. An example of this is a video streaming service during a major sporting event. The system is designed to perform well under normal traffic but must also handle sudden spikes in users when a popular game starts</p>
</li>
</ol>
<p>This is quite a general view of what reliability is, and the definition changes with time, as systems change with changing technology.</p>
<h2 id="heading-what-makes-software-reliable"><strong>What Makes Software Reliable?</strong></h2>
<p>There are various key components that are used industry wide to make distributed software reliable as used across large scale systems.</p>
<h3 id="heading-data-replication"><strong>Data Replication</strong></h3>
<p>Data replication is a fundamental concept in system design where data is intentionally duplicated and stored in multiple locations or servers.</p>
<p>This redundancy serves several critical purposes, including enhancing data availability, improving fault tolerance, and enabling load balancing.</p>
<p>By replicating data across different nodes or data centers, we may be able to ensure that, in the event of a hardware failure or network issue, the data remains accessible. This reduces downtime and enhances system reliability.</p>
<p>It's essential to implement replication strategies carefully, considering factors like consistency, synchronization, and conflict resolution to maintain data integrity and reliability in distributed systems.</p>
<p>Let’s look at a concrete example. With a primary-secondary database model such as one used with e-commerce websites, we may have the following:</p>
<ol>
<li><p>Replication: The primary database handles all the write operations, whereas the secondary database(s) handles all the reads. This ensures that reads are spread out across multiple databases, enhancing performance and lowering the probability of a crash.</p>
</li>
<li><p>Consistency: The system may use eventual consistency to maintain integrity, ensuring that all replicas eventually reflect the same data. But during high-traffic periods, the website may temporarily allow for slight inconsistencies, such as showing outdated inventory levels.</p>
</li>
<li><p>Conflict Resolution: If two users attempt to buy a single available item at the same time, a conflict resolution strategy may be used. For instance, the system could use timestamps to determine the customer who gets assigned the product, and this may dictate database updates eventually.</p>
</li>
</ol>
<h3 id="heading-load-distribution-across-machines"><strong>Load Distribution Across Machines</strong></h3>
<p>Load distribution involves distributing computational tasks and network traffic across multiple servers or resources to optimize performance and ensure system scalability.</p>
<p>By intelligently spreading workloads, load distribution prevents any single server from becoming overwhelmed, reducing the risk of bottlenecks and downtime.</p>
<p>Some very commonly used load distribution mechanisms are:</p>
<ol>
<li><p>Using Load Balancers: A load balancer can evenly distribute incoming traffic across multiple servers, preventing any single server from becoming a bottleneck.</p>
</li>
<li><p>Dynamic Scaling: Dynamic or auto-scaling can be used to automatically adjust the number of active servers based on current demand, adding more resources during peak times and scaling down during low traffic.</p>
</li>
<li><p>Caching: Caching layers can be used to store frequently accessed data, reducing the load on backend servers by serving requests directly from the cache.</p>
</li>
</ol>
<h3 id="heading-capacity-planning"><strong>Capacity Planning</strong></h3>
<p>Capacity planning entails analyzing factors such as expected user growth, data storage requirements, and processing capabilities to ensure that the system can handle increased loads without performance degradation or downtime.</p>
<p>By accurately forecasting resource needs and scaling infrastructure accordingly, such planning helps optimize costs, maintain reliability, and provide a seamless user experience. Being proactive can help ensure a system is well-prepared to adapt to changing requirements and remains robust and efficient throughout its lifecycle.</p>
<p>A lot of modern systems can scale automatically with projected loads. When traffic or processing requirements increase, such auto scaling automatically provisions additional resources to handle the load. Conversely, when demand decreases, it scales down resources to optimize cost efficiency.</p>
<h3 id="heading-metrics-and-automated-alerting"><strong>Metrics and Automated Alerting</strong></h3>
<p>Metrics involve collecting and analyzing data points that provide insights into various aspects of system behavior, such as resource utilization, response times, error rates, and more.</p>
<p>Automated alerting complements metrics by enabling proactive monitoring. This involves setting predefined thresholds or conditions based on metrics. When a metric crosses or exceeds these thresholds, automated alerts get triggered. These alerts can notify system administrators or operators, allowing them to take immediate action to address potential issues before they impact the system or users.</p>
<p>When used together, metrics and automated alerting create a robust monitoring and troubleshooting system, helping ensure that anomalies or problems are quickly detected and resolved.</p>
<p>Now that you know a bit about what reliability means in the context of Distributed Systems, we can move on to Message Queues.</p>
<h2 id="heading-what-is-a-message-queue"><strong>What is a Message Queue?</strong></h2>
<p>A message queue is a communication mechanism used in distributed systems to enable asynchronous communication between different components or services. It acts as an intermediary that allows one component to send a message to another without the need for direct, synchronous communication.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1701696280571/8697cb07-c765-4f9e-b709-a7a03adf3e11.png?auto=compress,format&amp;format=webp" alt="Multiple producers adding messages to a message queue that in turn are consumed by a consumer." class="image--center mx-auto" width="480" height="370" loading="lazy"></p>
<p>Above, you can see that there are multiple nodes (called Producers) that create messages that are sent to a message queue. These messages are processed by a node called the Consumer node, which may perform a series of actions (for instance database reads, or writes) as a part of each message being processed.</p>
<p>Now let’s look at an actual example where a message queue may be useful. Let’s assume we have an e-commerce website that allows millions of orders to be processed.</p>
<p>Processing an order may take place in the following steps:</p>
<ol>
<li><p>A user creates an order. This sets off a request to a web server, that in turn creates a message that is placed in the orders queue.</p>
</li>
<li><p>A consumer reads the message, and in turn calls different services while processing the message (for instance the inventory checks, the payment service, the shipping service)</p>
</li>
<li><p>Once all processing steps have completed, the consumer removes the message from the queue.</p>
</li>
</ol>
<p>Note that in case there are parts of the system that fail, the message can be left in the queue to be re-processed.</p>
<p>Even in cases where there is a total outage on the processing side of things, messages can simply pile up in the queue and be consumed once services are functional again. This is an example of a queue being useful in multiple failure scenarios.</p>
<p>Let’s look at some code for this scenario using AWS SQS, which is a popular message queue service that allows users to create queues, send messages to the queue, and also consume messages from queues for processing.</p>
<p>The below example uses <a target="_blank" href="http://boto3.amazonaws.com">Boto3</a> which is a Python Client for AWS SQS.</p>
<p>First, we’ll place an order, assuming we already have an SQS queue called OrderQueue in place.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> json

<span class="hljs-comment"># Create an SQS client</span>
sqs = boto3.client(<span class="hljs-string">'sqs'</span>)

<span class="hljs-comment"># Let's assume the queue is called OrderQueue</span>
<span class="hljs-comment"># This is the queue in which orders are placed</span>
queue_url = <span class="hljs-string">'https://sqs.us-east-1.amazonaws.com/2233334/OrderQueue'</span>

<span class="hljs-comment"># Function to send an order message</span>
<span class="hljs-comment"># This places an order in the queue, which can at any time be</span>
<span class="hljs-comment"># picked up by a consumer and then processed</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">send_order</span>(<span class="hljs-params">order_details</span>):</span>
    message_body = json.dumps(order_details)
    response = sqs.send_message(
        QueueUrl=queue_url,
        MessageBody=message_body
    )
    print(<span class="hljs-string">f'Order sent with ID: <span class="hljs-subst">{response[<span class="hljs-string">"MessageId"</span>]}</span>'</span>)

<span class="hljs-comment"># Using the queue to place an order</span>
<span class="hljs-comment"># Defining a sample order</span>

order = {
    <span class="hljs-string">'order_id'</span>: <span class="hljs-string">'12345'</span>,
    <span class="hljs-string">'customer_id'</span>: <span class="hljs-string">'67890'</span>,
    <span class="hljs-string">'items'</span>: [
        {<span class="hljs-string">'product_id'</span>: <span class="hljs-string">'abc123'</span>, <span class="hljs-string">'quantity'</span>: <span class="hljs-number">2</span>},
        {<span class="hljs-string">'product_id'</span>: <span class="hljs-string">'xyz456'</span>, <span class="hljs-string">'quantity'</span>: <span class="hljs-number">1</span>}
    ],
    <span class="hljs-string">'total_price'</span>: <span class="hljs-number">59.99</span>
}

<span class="hljs-comment"># Sending the order to the queue which is expected to be picked up </span>
<span class="hljs-comment"># by a consumer and processed eventually.</span>
send_order(order)
</code></pre>
<p>Then once the order has been placed, here’s some code that illustrates how it’ll be picked up for processing:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> json

<span class="hljs-comment"># Create an SQS client</span>
sqs = boto3.client(<span class="hljs-string">'sqs'</span>)

<span class="hljs-comment"># Processing orders from the same queue defined above</span>
queue_url = <span class="hljs-string">'https://sqs.us-east-1.amazonaws.com/2233334/OrderQueue'</span>

<span class="hljs-comment"># Function to receive and process orders</span>
<span class="hljs-comment"># Picking up a maximum of 10 messages at a time to process</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">receive_orders</span>():</span>
    response = sqs.receive_message(
        QueueUrl=queue_url,
        MaxNumberOfMessages=<span class="hljs-number">10</span>,  <span class="hljs-comment"># Up to 10 messages</span>
        WaitTimeSeconds=<span class="hljs-number">10</span>
    )

    messages = response.get(<span class="hljs-string">'Messages'</span>, [])

    <span class="hljs-keyword">for</span> message <span class="hljs-keyword">in</span> messages:
        order_details = json.loads(message[<span class="hljs-string">'Body'</span>])
        print(<span class="hljs-string">f'Processing order: <span class="hljs-subst">{order_details}</span>'</span>)

        <span class="hljs-comment"># Processing the order with details such as </span>
        <span class="hljs-comment"># processing payments, updating the inventory levels,</span>
        <span class="hljs-comment"># processing shipping etc.</span>

        <span class="hljs-comment"># Delete the message after processing</span>
        <span class="hljs-comment"># This is important since we don't want an</span>
        <span class="hljs-comment"># order to be processed multiple times.</span>
        sqs.delete_message(
            QueueUrl=queue_url,
            ReceiptHandle=message[<span class="hljs-string">'ReceiptHandle'</span>]
        )

<span class="hljs-comment"># Receive a batch of orders</span>
receive_orders()
</code></pre>
<h3 id="heading-what-is-an-intermediary-in-a-distributed-system"><strong>What is an Intermediary in a Distributed System?</strong></h3>
<p>In the context of what we’re discussing here, a message queue is an intermediary. Quoting Amazon AWS’ definition of a message queue:</p>
<blockquote>
<p>“<a target="_blank" href="https://aws.amazon.com/sqs/"><strong><em>Amazon Simple Queue Service (Amazon SQS)</em></strong></a> <em>lets you send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available</em>.”</p>
</blockquote>
<p>This is a wonderfully succinct and accurate description of why a message queue (an intermediary) is important.</p>
<p>In a message queue, messages are placed in a queue data structure, which you can think of as a temporary storage area. The producer places messages in the queue, and the consumer retrieves and processes them at its own pace. This decoupling of producers and consumers allows for greater flexibility, scalability, and fault tolerance in distributed systems.</p>
<h2 id="heading-how-message-queues-help-make-distributed-systems-more-reliable">How Message Queues Help Make Distributed Systems More Reliable</h2>
<p>Now let's discuss how Message Queues help make Distributed Systems more reliable.</p>
<h3 id="heading-1-message-queues-provide-flexibility"><strong>1. Message Queues Provide Flexibility</strong></h3>
<p>Message queues allow for <a target="_blank" href="https://en.wikipedia.org/wiki/Asynchrony_\(computer_programming\)"><strong>asynchronous communication</strong></a> between components. This means that producers can send messages to the queue without waiting for immediate processing by consumers. This allows components to work independently and at their own pace, providing flexibility in terms of processing times. So this is a great way to make designs flexible, and as self contained as possible.</p>
<h3 id="heading-2-message-queues-make-systems-scalable"><strong>2. Message Queues Make Systems Scalable</strong></h3>
<p>Message queues are often the bread and butter of scalable distributed systems for the following reasons:</p>
<ol>
<li><p>Multiple producers can add messages to a message queue. This raises the ceiling and allows us to easily horizontally scale applications.</p>
</li>
<li><p>Multiple consumers can read from a message queue. This again allows us to easily scale throughput if needed in a lot of scenarios.</p>
</li>
</ol>
<h3 id="heading-3-message-queues-make-systems-fault-tolerant"><strong>3. Message Queues Make Systems Fault Tolerant</strong></h3>
<p>What happens if a distributed system is overwhelmed? We sometimes need to have the ability to <em>cut the cord</em> in order to get the system back to a working state. We’d ideally want the ability to process requests that weren’t processed when the system was down.</p>
<p>This is exactly what a message queue can help us with. We may have hundreds of thousands of requests that weren’t processed, but are still in the queue. These can be processed once our system is back online.</p>
<h2 id="heading-challenges-with-message-queues"><strong>Challenges with Message Queues</strong></h2>
<p>As with life, using message queues in distributed systems isn’t a silver bullet to scaling problems.</p>
<p>Here are some situations where message queues may be useful:</p>
<ol>
<li><p>Asynchronous Processing: Messages queues are generally an excellent choice in infrastructure wherever asynchronous processing is required. In workflows such as sending confirmation emails or generating reports after an order is placed, message queues can decouple these tasks from the primary application flow.</p>
</li>
<li><p>Load Balancing: As we saw in our example for message queues, in scenarios where traffic spikes occur, message queues can buffer incoming requests, allowing multiple consumers to process messages concurrently. This helps distribute the load evenly across available resources.</p>
</li>
<li><p>Fault Tolerance: In systems where reliability is crucial, message queues provide a mechanism for handling failures. If a service is temporarily down, messages can be retained in the queue until the service is available again, ensuring that no data is lost unless intended.</p>
</li>
</ol>
<p>Here are a some situations where message queues may not be useful:</p>
<ol>
<li><p>Message queues can be great in scenarios where ordering of messages does not matter. But in situations where order does matter, they can sometimes be slow and more expensive to use.</p>
</li>
<li><p>Designing systems with queues that have multiple consumers isn’t trivial. What happens if a message is processed twice? Is <a target="_blank" href="https://en.wikipedia.org/wiki/Idempotence"><strong>idempotency</strong></a> a requirement? Or does it break our use case? These complexities can often lead us to situations where message queues may not be the best solution.</p>
</li>
</ol>
<h2 id="heading-summary"><strong>Summary</strong></h2>
<p>In this article, you learned about reliability in distributed systems, and how message queues can help make such systems more reliable. Here’s a summary of the key takeaways:</p>
<ol>
<li><p>Reliability is central to distributed systems and there are a few common ways this is handled across the tech industry. Data replication, load distribution, and capacity planning are some ways that can improve the reliability of a system.</p>
</li>
<li><p>Message Queues are intermediaries that can store messages from producers. They can be picked up by consumers at a rate that's generally independent of the rate of production.</p>
</li>
<li><p>Queues are flexible, allowing us to immediately stem the flow of unwanted event processing in case of an unforeseen event.</p>
</li>
<li><p>Despite the versatility of message queues, they're not a panacea for reliability issues. There are often multiple considerations to be kept in mind while processing messages in a message queue.</p>
</li>
</ol>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
