EC2 Placement Groups: Master Cluster, Spread & Partition Hands-On Lab 2026
Lab | Estimated Time: 45 minutes | Difficulty: Intermediate
Table of Contents: EC2 Placement Groups
Introduction
Here’s something I wish someone had told me years ago: EC2 placement groups don’t automatically make your instances faster. I’ve watched junior engineers spin up a cluster placement group, launch two t3.micro instances, and wonder why their application isn’t suddenly performing like it’s on bare metal. That’s not how any of this works.
EC2 placement groups are about influencing where AWS physically places your instances within their data centers. That’s it. They’re a hint to the hypervisor scheduler, not a magic performance button. But when you understand what each type actually does — and more importantly, when to use each one — you unlock serious control over latency, fault tolerance, and blast radius management.
This lab teaches you to create all three placement group types, launch instances into each, and actually measure the difference using iperf3. We’ll run real network benchmarks and observe how physical proximity (or deliberate separation) affects your workloads. By the end, you’ll understand why a genomics pipeline architect chooses differently than someone running a distributed database.
This lab is designed for DevOps engineers and cloud practitioners who already know how to launch EC2 instances but want to understand the why behind infrastructure placement decisions. If you’ve ever wondered why your cross-AZ replication lag spikes during peak hours, or why your HPC job runs 3x slower than expected, you’re in the right place.
Lab Overview
In this hands-on AWS tutorial, you’ll build three distinct placement groups and observe their behavior firsthand. If you’ve completed Lab — Instance Store vs EBS Deep, this lab builds directly on that performance-focused mindset. You’ll launch instances into a cluster placement group (tight physical proximity), a spread placement group (maximum hardware separation), and a partition placement group (logical failure domain isolation). Then you’ll install iperf3 and actually measure network throughput and latency between instances.
The skills you’ll gain extend far beyond clicking through the console. You’ll develop intuition for when physical placement matters — and when it doesn’t. You’ll understand why a cluster placement group can become a trap during capacity constraints, and why spread placement groups have hard limits that surprise people in production.
Real-world workloads that depend on correct placement decisions include high-frequency trading systems (where microseconds matter), distributed databases like Cassandra or CockroachDB (where partition tolerance is everything), and tightly-coupled HPC workloads like weather modeling or financial risk calculations.
I’ve seen incorrect placement group usage cause outages. One team launched their entire Kafka cluster in a single cluster placement group, then couldn’t add brokers during a traffic spike because AWS didn’t have contiguous capacity. Another team ignored placement entirely for their Elasticsearch cluster and couldn’t figure out why shard rebalancing took hours instead of minutes. These are expensive lessons you don’t need to learn the hard way.
Quick Comparison: Placement Group Types
Before diving into the hands-on steps, here’s a quick reference comparing all three strategies. This table captures the essential trade-offs you’ll observe throughout the lab.
| Placement Group | Primary Goal | Scale Capacity | Fault Isolation | Best Use Cases |
|---|---|---|---|---|
| Cluster | Lowest network latency | Limited by rack capacity | Low (single rack) | HPC, ML training, real-time analytics |
| Spread | Maximum hardware isolation | 7 instances per AZ | Very high (separate racks) | etcd, ZooKeeper, critical quorums |
| Partition | Controlled blast radius | Hundreds of instances | Medium–High (partition-level) | Cassandra, HDFS, Kafka, HBase |
Keep this mental model as you work through the lab. Each strategy optimizes for a different dimension, and choosing wrong can hurt you in production.
Prerequisites
Before starting this step-by-step lab, ensure you have the following ready.
You need an active AWS account with permissions to create placement groups and launch EC2 instances. Your IAM user or role should have the ec2:CreatePlacementGroup, ec2:DeletePlacementGroup, ec2:DescribePlacementGroups, and standard EC2 launch permissions.
Basic familiarity with the EC2 console and Linux command line is expected. You should know how to SSH into an instance or connect via Session Manager. If you’re not comfortable with SSH yet, check out our {link:EC2 SSH Connection Guide}.
You’ll need to install iperf3 on your instances, which requires either outbound internet access (via NAT Gateway or public subnet) or pre-baked AMIs. We’ll cover the installation commands, but your instances need a path to package repositories.
A default VPC works fine for this lab, though I recommend using a dedicated VPC for learning. At minimum, you need one subnet, and we’ll discuss why multi-AZ setups matter for spread and partition groups.
Step-by-Step Hands-On Lab
Step 1: Navigate to Placement Groups
Open the AWS Console and navigate to EC2 → Network & Security → Placement Groups. This section is often overlooked because it’s tucked away from the main EC2 dashboard. You should see an empty list if this is a fresh account or region.
What you’re looking at is the control plane for influencing instance physical placement. Each placement group you create here becomes a target you can specify when launching instances. The placement group itself doesn’t cost anything — you only pay for the instances you launch into it.
Step 2: Create a Cluster Placement Group
Click Create placement group. Name it lab-cluster-pg and select Cluster as the strategy. Leave the default settings and click Create.
A cluster placement group tells AWS to pack your instances as close together as physically possible — ideally on the same rack, potentially on the same physical host. This minimizes network hops and delivers the lowest possible latency between instances. AWS documentation quotes up to 25 Gbps bandwidth between instances in a cluster placement group using enhanced networking.
Important: Cluster placement groups deliver the biggest performance gains with instances that support enhanced networking (ENA), such as c5, m5, r5, and newer instance families. Testing with t3 instances will show improvements, but don’t expect the dramatic gains you’d see with compute-optimized types designed for high-throughput workloads.
The catch? All instances must be in the same Availability Zone. You’re trading fault tolerance for performance. If that rack loses power, every instance in the group goes down together.
Common misconfiguration: launching instances in a cluster placement group across multiple AZs. This will fail. The console will reject it, but I’ve seen Terraform scripts silently place instances in different AZs when the placement group constraint couldn’t be satisfied.
Step 3: Create a Spread Placement Group
Create another placement group named lab-spread-pg with the Spread strategy.
Spread placement groups are the opposite of cluster groups. AWS guarantees that each instance lands on distinct underlying hardware. Different racks, different power sources, different network switches. Maximum hardware fault isolation.
The hard limit here surprises people: you can only have seven instances per Availability Zone in a spread placement group. That’s it. This isn’t a soft limit you can request an increase for — it’s a physical constraint of how AWS manages hardware separation.
Use spread placement groups for small, critical clusters where each node must survive independent hardware failures. Think of a three-node etcd cluster or a ZooKeeper quorum. You don’t need many instances, but losing two to the same hardware fault would be catastrophic.
Step 4: Create a Partition Placement Group
Create a third placement group named lab-partition-pg with the Partition strategy. You’ll be asked for the number of partitions — set it to 3.
Partition placement groups are the hybrid approach. Instances within a partition can share hardware, but partitions themselves are isolated from each other. Think of partitions as failure domains you define yourself.
This is ideal for large distributed systems like HDFS, HBase, or Cassandra where you want to align your application’s replication strategy with physical fault domains. If you configure Cassandra to place replicas across different partitions, a hardware failure takes out at most one replica.
Unlike spread groups, partition groups can scale to hundreds of instances. You’re limited to seven partitions per AZ, but each partition can hold as many instances as capacity allows.
Step 5: Launch Instances into Each Placement Group
Launch two t3.medium instances into each placement group. When configuring the instance, expand Advanced details and find the Placement group dropdown. Select the appropriate group for each launch.
For the cluster placement group, ensure both instances are in the same AZ. For spread and partition groups, you can distribute across AZs if desired.
Use Amazon Linux 2023 as your AMI. Enable Auto-assign public IP if you’re using a public subnet, or ensure Session Manager access if you’re in a private subnet.
Tag your instances clearly: Name: cluster-instance-1, Name: spread-instance-1, etc. You’ll thank yourself during testing.
Step 6: Install iperf3 on All Instances
SSH or use Session Manager to connect to each instance. Install iperf3 with the following commands:
sudo dnf update -y
sudo dnf install iperf3 -y
iperf3 --version
If you’re using Amazon Linux 2 (not 2023), replace dnf with yum. The version check confirms installation succeeded.
Step 7: Run Network Benchmarks
Before testing: Ensure your security groups allow TCP port 5201 between instances, or iperf tests will silently fail with connection refused errors. Add an inbound rule for TCP 5201 from your VPC CIDR block.
On one instance in your cluster placement group, start an iperf3 server:
iperf3 -s
On the second cluster instance, run the client test using the first instance’s private IP:
iperf3 -c 10.0.1.45 -t 30
Replace 10.0.1.45 with your actual private IP. The -t 30 flag runs the test for 30 seconds, giving you stable averages.
Record the bandwidth (Gbits/sec) and note any retransmits. Repeat this process for your spread and partition placement group instances.
For latency testing, simple ping works:
ping -c 100 10.0.1.45
The average RTT in your cluster group should be noticeably lower than cross-AZ tests.
Real Lab Experiences: Architect Insights
Let me share what AWS documentation won’t tell you.
Cluster placement groups can become a trap during scale-out events. I once worked with a team running a machine learning training cluster. They launched 50 instances into a cluster placement group, and everything was perfect. Two months later, they tried to add 10 more instances for a larger training job. AWS couldn’t satisfy the request — there wasn’t enough contiguous capacity in that specific rack location. They had to terminate the entire cluster and relaunch with 60 instances from scratch, during business hours, with a deadline looming.
The lesson? If you’re using cluster placement groups, launch all your instances together. Don’t assume you can incrementally add capacity later.
Spread placement groups have their own gotcha. The seven-instance limit is per AZ, but people forget this when building HA architectures. If you need 15 instances with hardware-level isolation, you need at least three AZs. In regions with only two AZs, you’re capped at 14 instances total. This has killed production deployments I’ve reviewed.
Partition placement groups require application awareness to be useful. Simply launching instances into partitions doesn’t magically give you fault tolerance. Your application needs to know which partition each instance belongs to and align its replication accordingly. AWS exposes partition information via instance metadata — your deployment automation should query it and configure your distributed system’s topology accordingly.
One more thing: cluster placement groups deliver the biggest wins with enhanced networking–capable instances like c5, m5, r5, and newer families. I’ve seen teams benchmark t3 instances in cluster groups and conclude “placement groups don’t help much.” They do help — but you need instances designed for high-throughput networking to see the dramatic improvements AWS advertises.
Validation & Testing
Successful iperf3 output from your cluster placement group should show bandwidth approaching the instance type’s maximum network capacity. For t3.medium, expect around 5 Gbps with bursting.
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.00 sec 17.4 GBytes 4.98 Gbits/sec 12 sender
Low retransmit counts indicate a healthy network path. If you see hundreds of retransmits, check your security groups and NACLs.
Compare this against your spread placement group instances in different AZs. You’ll likely see slightly lower throughput and measurably higher latency — typically 1-2ms additional RTT for cross-AZ traffic versus sub-millisecond within an AZ.
The partition placement group results depend on partition distribution. Same-partition instances should perform similarly to cluster groups. Cross-partition, same-AZ traffic should match normal intra-AZ performance.
Troubleshooting Guide
Instance launch fails with “Insufficient capacity”: The most common issue with cluster placement groups. AWS can’t find enough contiguous capacity. Solutions include trying a different AZ, using a smaller instance type, or launching during off-peak hours.
Spread placement group rejects instance launch: You’ve hit the seven-instance-per-AZ limit. Either use a different AZ or switch to partition placement groups for larger deployments.
iperf3 connection refused: Security groups aren’t allowing port 5201 (iperf3 default). Add an inbound rule for TCP 5201 from your VPC CIDR.
Zero bandwidth results: Confirm both instances can reach each other with ping. Check that iperf3 server is actually running with ss -tulpn | grep 5201.
Useful debugging commands:
aws ec2 describe-placement-groups --region us-east-1
ping -c 5 <private-ip>
ss -tulpn | grep 5201
iperf3 --version
AWS Best Practices: Solutions Architect Perspective
From a security standpoint, placement groups don’t affect your security posture directly, but remember that cluster placement groups create a single failure domain. Ensure your security group rules are least-privilege, especially for iperf testing ports which should be removed after validation.
For reliability, spread placement groups provide the strongest hardware fault isolation. Use them for stateful workloads where losing multiple nodes to a single hardware event would cause data loss. Partition placement groups give you reliability if your application understands partition topology.
Performance efficiency is where cluster placement groups shine. If your workload involves tight inter-node communication — MPI jobs, distributed training, real-time analytics — the latency reduction is measurable and meaningful. Don’t use cluster groups if your instances don’t communicate heavily with each other.
Cost optimization isn’t directly affected by placement groups, but there’s an indirect impact. Cluster placement groups that can’t scale force you into larger instance types earlier. Spread placement limits can push you toward multi-region architectures sooner than planned. Factor these constraints into capacity planning.
For operational excellence, tag your placement groups with purpose and ownership. Include them in your infrastructure-as-code templates. Document which applications depend on specific placement strategies so on-call engineers understand the constraints.
Don’t use placement groups when they don’t add value. General-purpose web servers behind a load balancer? Probably don’t need placement constraints. Stateless API containers? Default placement is fine. Save placement groups for workloads where physical topology actually matters.
Real AWS Interview Questions: EC2 Placement Groups
If you’re preparing for AWS Solutions Architect or DevOps Engineer interviews, expect placement group questions. Here are real questions I’ve encountered or asked during interviews, along with how a strong candidate would approach them.
Question 1: Your company runs a tightly-coupled HPC workload for financial risk modeling. The job takes 4 hours but network latency between nodes is causing stragglers that extend runtime to 6 hours. How would you address this?
A strong answer would identify cluster placement groups as the solution, explain that they minimize inter-node latency by placing instances on the same rack, and mention the requirement for instances that support enhanced networking (ENA). The candidate should also note the trade-off: all instances in the same failure domain. A great candidate would add that launching all instances simultaneously is critical because incremental scaling often fails due to capacity constraints.
Question 2: You’re architecting a three-node etcd cluster for Kubernetes. What placement strategy would you recommend and why?
Spread placement group is the correct answer here. The candidate should explain that etcd requires quorum (2 of 3 nodes) and losing two nodes to a single hardware failure would be catastrophic. Spread placement guarantees each instance runs on separate hardware. A strong candidate would mention the seven-instance-per-AZ limit and note that for a three-node cluster, this isn’t a concern.
Question 3: A team wants to deploy a 200-node Cassandra cluster with hardware fault isolation. They tried spread placement groups but hit scaling limits. What would you recommend?
Partition placement groups solve this problem. The candidate should explain that partitions can hold hundreds of instances while maintaining isolation between partitions. The key insight is that Cassandra’s rack-aware replication should be configured to align with AWS partitions — each partition becomes a Cassandra “rack” from a topology perspective. Mention that partition metadata is available via instance metadata service.
Question 4: An application team launched instances in a cluster placement group but sees no performance improvement. What questions would you ask?
Strong diagnostic questions include: What instance types are you using? (t-series won’t show dramatic gains), Are instances actually communicating with each other heavily? (cluster groups only help inter-instance traffic), Did you verify all instances are in the same placement group? (Terraform can silently skip placement constraints), Are you using enhanced networking? (check ENA driver). This question tests troubleshooting methodology, not just knowledge.
Question 5: Can you explain a scenario where placement groups caused a production incident?
This is a behavioral/experience question. A good answer describes a real scenario: cluster placement group preventing scale-out during traffic spikes, spread limits blocking new instance launches, or partition groups providing false confidence because the application wasn’t partition-aware. The interviewer wants to see you’ve actually worked with these constraints, not just read documentation.
Frequently Asked Questions (FAQs)
What is an EC2 placement group in AWS?
An EC2 placement group is a logical grouping that influences how AWS physically places your instances within their data centers. When you launch instances into a placement group, you’re telling AWS to apply specific placement rules — either packing instances close together for low latency, spreading them across distinct hardware for fault isolation, or organizing them into partitions for controlled failure domains. Placement groups themselves are free; you only pay for the instances you launch into them.
What is the difference between cluster, spread, and partition placement groups?
Cluster placement groups pack instances as close together as physically possible, minimizing network latency but creating a single point of failure. Spread placement groups guarantee each instance runs on separate hardware, maximizing fault isolation but limiting you to seven instances per Availability Zone. Partition placement groups divide instances into logical partitions where each partition is isolated from others, allowing hundreds of instances while maintaining failure domain boundaries. Choose cluster for HPC and ML training, spread for critical small clusters like etcd, and partition for large distributed databases like Cassandra or HDFS.
Can EC2 placement groups span multiple Availability Zones?
It depends on the type. Cluster placement groups cannot span multiple AZs — all instances must be in the same AZ because the goal is physical proximity. Spread and partition placement groups can span multiple AZs, which actually increases their fault isolation benefits since you’re distributing across both hardware and geographic boundaries. When designing for high availability, multi-AZ spread or partition groups are common patterns.
Why does my cluster placement group launch fail with insufficient capacity?
Cluster placement groups require AWS to find contiguous physical capacity — enough space on the same rack or nearby racks to place all your instances together. If that capacity doesn’t exist, the launch fails. This commonly happens when adding instances to an existing cluster group or launching during peak demand periods. Solutions include launching all instances simultaneously, trying a different AZ, using smaller instance types, or launching during off-peak hours. This is a fundamental limitation of how cluster placement works.
What is the maximum number of instances in a spread placement group?
Spread placement groups have a hard limit of seven instances per Availability Zone. This isn’t a soft limit you can request increases for — it’s a physical constraint of how AWS guarantees hardware separation. In a region with three AZs, you can have a maximum of 21 instances in a spread placement group. If you need hardware isolation for more instances, use partition placement groups instead, which can scale to hundreds of instances while still providing partition-level fault isolation.
Do EC2 placement groups cost extra money?
No, placement groups themselves are completely free. You only pay for the EC2 instances you launch into them at standard instance pricing. However, there are indirect cost implications: cluster placement group capacity constraints might force you to use larger instance types, and spread placement limits might push you toward multi-region architectures. Factor these architectural constraints into your cost planning.
How do I check which placement group an instance belongs to?
You can check via the AWS Console by selecting the instance and viewing the Details tab, which shows the placement group name. Programmatically, use the AWS CLI command aws ec2 describe-instances --instance-ids i-xxxxx and look for the Placement.GroupName field. For partition placement groups, you can also query the instance metadata service from within the instance to get the partition number: curl http://169.254.169.254/latest/meta-data/placement/partition-number.
When should I NOT use EC2 placement groups?
Don’t use placement groups when your instances don’t communicate heavily with each other or when default AWS placement provides sufficient fault isolation. Stateless web servers behind a load balancer, containerized microservices with no inter-node traffic, and general-purpose application servers typically don’t benefit from placement constraints. Adding placement groups in these scenarios just introduces capacity constraints without corresponding benefits. Reserve placement groups for workloads where physical topology genuinely matters to performance or reliability.
Can I move an existing EC2 instance into a placement group?
You cannot move a running instance into a placement group. The instance must be stopped first. Once stopped, you can modify the placement group setting and restart the instance, but AWS doesn’t guarantee it can satisfy the placement constraint — if capacity isn’t available in the target placement group’s location, the start will fail. For production systems, it’s safer to launch new instances directly into the placement group and migrate your workload, rather than trying to move existing instances.
Do placement groups work with Auto Scaling groups?
Yes, you can specify a placement group in your Auto Scaling group’s launch template or launch configuration. However, be aware of the implications: cluster placement groups may cause scaling failures if AWS can’t find contiguous capacity, and spread placement groups will fail once you hit the seven-instance-per-AZ limit. Partition placement groups work best with Auto Scaling because they can accommodate growth. Configure your Auto Scaling group’s placement strategy thoughtfully based on your placement group type.
Conclusion & Next Steps
You’ve now created all three EC2 placement group types and measured their real-world impact on network performance. More importantly, you understand when each strategy applies and the hidden constraints that trip up production deployments.
Cluster placement groups minimize latency but create capacity and fault tolerance risks. Spread placement groups maximize hardware isolation but cap at seven instances per AZ. Partition placement groups offer flexible fault domain control but require application awareness to deliver value.
This knowledge matters because placement decisions are often invisible until they cause problems. The team that can’t scale their ML cluster, the database that loses quorum to a single rack failure, the Kafka deployment that can’t rebalance — these are placement group problems disguised as something else.
In our next lab, Lab 2.5 — Elastic Network Interfaces (ENIs) & Multi-NIC Architectures, we’ll explore how to attach multiple network interfaces to instances and build sophisticated network topologies. This pairs naturally with placement group knowledge when designing high-throughput, fault-tolerant architectures.
Related Resources:
