EC2 Private Subnet Setup: VPC, NAT Gateway & SSM Access Lab

Table of Contents: EC2 Private Subnet

Introduction

Here’s something I learned the hard way during my first year as an AWS architect: just because you can assign a public IP to an EC2 instance doesn’t mean you should.

I once inherited a production environment where every single instance—including database servers—had public IPs and SSH wide open on port 22. The security team nearly had a collective heart attack during the audit. That experience fundamentally changed how I approach AWS networking, and it’s exactly why this lab exists.

In Lab 2.1, you’re going to build something that actually resembles production infrastructure. We’ll create a custom VPC from scratch, carve it into public and private subnets, wire up proper routing with an Internet Gateway and NAT Gateway, and then do something that trips up a lot of beginners: connect to an EC2 instance in a private subnet without SSH.

No public IP. No open port 22. No security group rules allowing inbound traffic from the internet. Just AWS Systems Manager Session Manager doing what it does best.

This lab is designed for engineers who’ve launched a few EC2 instances but haven’t yet built proper network isolation. If you’re preparing for the Solutions Architect exam or transitioning into a DevOps role, this is foundational knowledge you’ll use constantly.

The most common mistakes I see from junior engineers? Putting everything in public subnets because “it’s easier.” Using SSH keys that get shared across the team (or worse, committed to Git). Forgetting that private subnets need a path to the internet for updates. We’re going to fix all of that today.

Why EC2 Private Subnets Matter in AWS VPC Design

Before we touch the console, let’s establish why this architecture pattern exists and why every serious AWS deployment uses it.

An EC2 private subnet is a network segment within your VPC that has no direct route to the internet. Instances here cannot be reached from outside your AWS environment, and they cannot reach the internet without assistance from a NAT Gateway or NAT instance. This isolation is the foundation of defense in depth.

Think about what actually needs internet exposure in a typical application. Load balancers need public access so users can reach your app. Maybe a bastion host if you’re still using traditional SSH (though we’ll eliminate that need today). But your application servers? Your databases? Your cache layers? None of these should be directly accessible from the internet.

The public vs private subnet distinction isn’t just a security checkbox—it fundamentally changes your attack surface. A misconfigured security group on a public instance can expose it to the entire internet. The same misconfiguration on a private instance? The blast radius is limited to resources within your VPC. That’s the difference between a minor incident and a breach that makes headlines.

In my AWS VPC tutorial work with enterprise clients, I’ve found that teams who understand this distinction early build significantly more secure infrastructure than those who learn it after their first security incident.

Lab Overview

By the end of this lab, you’ll have built a production-style VPC architecture that includes a public subnet for internet-facing resources, a private subnet for protected workloads, proper routing through an Internet Gateway and NAT Gateway, and secure, auditable access to your private instance using SSM Session Manager.

The skills you’ll gain here translate directly to real-world scenarios. Every three-tier web application I’ve deployed in the last five years follows this exact pattern: load balancers in public subnets, application servers in private subnets, databases in isolated subnets. Financial services, healthcare, SaaS platforms—they all use this architecture because it provides defense in depth without sacrificing operability.

You’ll also learn why SSH-less access isn’t just a security nicety anymore. In regulated environments—think SOC 2, HIPAA, PCI-DSS—auditors love Session Manager because every command gets logged to CloudWatch. Try getting that level of audit trail with traditional SSH.

What We Are Intentionally NOT Doing

This lab takes a modern, security-first approach. Here’s what we’re deliberately avoiding and why it matters.

We’re not creating a bastion host. Bastion hosts were the standard pattern for years, but they introduce key management overhead, require patching, and create an additional attack vector. Session Manager eliminates the need entirely.

We’re not using SSH keys. SSH key management is a security nightmare at scale. Keys get shared, lost, or compromised. With SSM, authentication happens through IAM—the same identity system you’re already managing.

We’re not adding any inbound security group rules on our private instance. Zero. None. This might feel wrong if you’re used to opening port 22, but SSM works entirely over outbound HTTPS connections initiated by the agent.

We’re not assigning a public IP to our private EC2 instance. This seems obvious, but I’ve reviewed architectures where “private” instances had public IPs “just in case.” That defeats the entire purpose of network isolation.

Prerequisites

Before we dive in, make sure you have an AWS account with permissions to create VPCs, subnets, Internet Gateways, NAT Gateways, EC2 instances, and IAM roles. If you’re using an IAM user rather than root (which you should be), verify you have the AmazonVPCFullAccess and AmazonEC2FullAccess managed policies attached, plus IAMFullAccess for creating the SSM role.

You should also be comfortable navigating the AWS Console and have a basic understanding of what EC2 instances are. If terms like “CIDR block” or “route table” sound completely foreign, consider reviewing the VPC fundamentals documentation first—but honestly, I’ll explain everything as we go.

One more thing: NAT Gateways cost money. We’re talking roughly $0.045 per hour plus data processing charges. For a quick lab, you’re looking at maybe a dollar or two, but don’t forget to tear everything down when you’re done. I’ve seen engineers leave NAT Gateways running for months and rack up hundreds in unexpected charges.

Step-by-Step Hands-On Lab

Step 1: Create Your VPC

Navigate to VPC → Your VPCs → Create VPC.

Select “VPC only” (not “VPC and more”—we want to understand each component individually). Name it lab-vpc and set the IPv4 CIDR block to 10.0.0.0/16.

Why /16? Because it gives you 65,536 IP addresses to work with, which is plenty of room to carve out multiple subnets later. In production, I’ve seen teams start with /24 and then realize they need more subnets than they have address space for. Always plan bigger than you think you need.

Leave IPv6 disabled for now and keep tenancy as “Default.” Click Create VPC.

You should see your new VPC appear in the list with a State of “Available.” Notice that AWS automatically creates a main route table and a default network ACL—we’ll be working with custom route tables instead.

Step 2: Create the Public Subnet

Go to VPC → Subnets → Create subnet.

Select your lab-vpc, name the subnet public-subnet-1a, and choose an availability zone. I’ll use us-east-1a in this example, but keep in mind that AZ names are mapped differently per AWS account—your us-east-1a might be a physically different data center than mine. Set the CIDR block to 10.0.1.0/24.

This gives us 256 addresses in our public subnet—more than enough for bastion hosts, NAT Gateways, and load balancers.

After creating, select the subnet and click Actions → Edit subnet settings. Enable “Auto-assign public IPv4 address.” This means any instance launched here automatically gets a public IP, which is exactly what we want for public-facing resources.

Step 3: Create the Private Subnet

Create another subnet in lab-vpc. Name it private-subnet-1a, use the same availability zone (us-east-1a for consistency), and set the CIDR to 10.0.2.0/24.

Do NOT enable auto-assign public IP here. That’s the whole point—instances in this subnet should not be directly reachable from the internet.

Step 4: Create and Attach the Internet Gateway

Head to VPC → Internet Gateways → Create internet gateway.

Name it lab-igw and create it. You’ll notice its state shows “Detached.” An Internet Gateway does nothing until you attach it to a VPC.

Select your new IGW, click Actions → Attach to VPC, and choose lab-vpc.

Here’s a common gotcha: you can only attach one Internet Gateway per VPC. If you try to create a second one, you’ll get an error. This trips people up when they’re troubleshooting and think “maybe I need another IGW”—you don’t.

Step 5: Configure the Public Route Table

Go to VPC → Route Tables → Create route table.

Name it public-rt and associate it with lab-vpc. After creation, select it and go to the Routes tab. Click Edit routes → Add route.

Set the destination to 0.0.0.0/0 (meaning “all traffic not matching other routes”) and the target to your Internet Gateway (lab-igw).

Now associate this route table with your public subnet. Go to Subnet associations → Edit subnet associations, check public-subnet-1a, and save.

This is what makes a subnet “public”—it has a route to an Internet Gateway. Without this route, even instances with public IPs can’t reach the internet.

Step 6: Create the NAT Gateway

Navigate to VPC → NAT Gateways → Create NAT gateway.

Name it lab-nat, place it in public-subnet-1a (yes, the NAT Gateway lives in the public subnet), and allocate an Elastic IP by clicking “Allocate Elastic IP.”

Click Create. The NAT Gateway takes 2-3 minutes to become available—grab some coffee.

Why does the NAT Gateway go in the public subnet? Because it needs internet access to forward traffic from your private instances. It acts as a translator, allowing outbound connections from private resources while blocking unsolicited inbound traffic.

Production Note: NAT Gateways are AZ-scoped, meaning a NAT Gateway in us-east-1a only serves resources in that AZ. For high availability in production environments, you should deploy one NAT Gateway per availability zone and configure each private subnet to route through its local NAT Gateway. This prevents a single AZ failure from taking down outbound connectivity for your entire application. For this lab, a single NAT Gateway is sufficient, but remember this pattern for the Solutions Architect exam and real-world deployments.

Step 7: Configure the Private Route Table

Create another route table called private-rt in lab-vpc.

Edit its routes and add 0.0.0.0/0 pointing to your NAT Gateway (not the Internet Gateway).

Associate this route table with private-subnet-1a.

Now your private instances can reach the internet for updates and API calls, but nothing from the internet can initiate a connection to them. This is exactly the behavior we want.

Step 8: Create the IAM Role for SSM

Go to IAM → Roles → Create role.

Select “AWS service” as the trusted entity and choose “EC2” as the use case. Click Next.

Search for and attach the AmazonSSMManagedInstanceCore managed policy. This is the minimum permission set needed for Session Manager.

Name the role EC2-SSM-Role and create it.

I’ve seen engineers attach the full AmazonSSMFullAccess policy here—don’t do that. It grants way more permissions than needed, including the ability to modify SSM resources. Least privilege isn’t just a buzzword; it limits blast radius when things go wrong.

Step 9: Launch EC2 in the Private Subnet

Go to EC2 → Instances → Launch instances.

Name your instance private-instance, select Amazon Linux 2023 AMI, and choose t3.micro (free tier eligible in most regions).

Under Network settings, click Edit. Select lab-vpc and private-subnet-1a. Confirm that “Auto-assign public IP” is disabled.

For the security group, create a new one called private-sg. Here’s the key insight: you don’t need ANY inbound rules. SSM Session Manager works over outbound HTTPS (port 443), not inbound SSH. Add an outbound rule allowing HTTPS (443) to 0.0.0.0/0 if one doesn’t exist by default.

Under Advanced details, find “IAM instance profile” and select your EC2-SSM-Role.

Security Note: Ensure IMDSv2 is set to “Required” under Metadata settings (this is the default for Amazon Linux 2023). IMDSv2 prevents credential theft attacks via SSRF vulnerabilities by requiring session-authenticated requests to the instance metadata service. This small setting has prevented countless real-world attacks where compromised web applications attempted to steal IAM role credentials.

Launch the instance.

Step 10: Connect Using SSM Session Manager

Wait for your instance to show “Running” and pass both status checks.

Select the instance and click Connect. Choose the “Session Manager” tab.

If everything is configured correctly, you’ll see an orange “Connect” button. Click it, and you’ll get a browser-based shell session into your private instance.

No SSH key. No public IP. No port 22. Just secure, encrypted, auditable access.

Step 11: Verify Outbound Internet Access

In your Session Manager shell, run:

curl -I https://aws.amazon.com

You should see an HTTP 200 response. This confirms your private instance can reach the internet through the NAT Gateway.

Try installing a package:

sudo yum update -y

If packages download successfully, your routing is working perfectly.

SSM Session Manager vs Bastion Host: Which Should You Use?

This is a question I get constantly from engineers transitioning from traditional infrastructure. Here’s my honest assessment after managing both approaches in production.

Bastion hosts made sense when they were invented. You needed a hardened jump box in a public subnet, configured SSH with key-based authentication, and hopped through it to reach private resources. It worked, but it introduced significant operational overhead.

Session Manager fundamentally changes the equation. There’s no infrastructure to manage because the service is fully managed by AWS. You don’t need to patch, monitor, or scale anything. Authentication happens through IAM, which you’re already managing for everything else in AWS. The audit trail is automatic—every session and command is logged to CloudWatch or S3 without additional configuration. There’s no network exposure whatsoever since the SSM agent initiates outbound connections only.

The cost comparison favors Session Manager as well. A bastion host requires at least a t3.micro running 24/7, plus EBS storage, plus your time managing it. Session Manager costs nothing for basic usage (you pay only for optional logging to S3 or CloudWatch).

When would I still use a bastion host? Honestly, almost never in greenfield deployments. The only scenario is if you have legacy tools that absolutely require SSH protocol and can’t work with the SSM CLI or port forwarding capabilities. Even then, I’d explore Session Manager’s port forwarding feature first.

For new projects and the Solutions Architect exam, default to Session Manager. It’s the modern pattern that AWS actively recommends.

When NOT to Use NAT Gateway

NAT Gateways solve a specific problem: giving private resources outbound internet access. But they’re not always the right tool, and they can get expensive fast.

You should skip the NAT Gateway entirely if your private workloads only need to communicate with other AWS services. VPC endpoints provide private connectivity to services like S3, DynamoDB, ECR, and dozens of others without traversing the internet. Gateway endpoints (for S3 and DynamoDB) are free. Interface endpoints cost about $0.01 per hour per AZ, which is still cheaper than NAT Gateway for high-traffic scenarios.

Consider skipping NAT Gateway for fully isolated workloads that shouldn’t have any internet access. Some compliance frameworks actually require this—certain data processing environments must be completely air-gapped from the internet. In these cases, a NAT Gateway would be a compliance violation.

Be cautious with NAT Gateway for high-bandwidth data transfer scenarios. At $0.045 per GB processed, transferring 10 TB through NAT Gateway costs $450. If you’re pulling container images from ECR frequently, streaming logs to external services, or processing large datasets from public APIs, those charges compound quickly. I’ve seen monthly NAT Gateway bills exceed the cost of the EC2 instances they’re serving.

My general guidance: start with VPC endpoints for AWS service access, add NAT Gateway only for resources that genuinely need public internet connectivity, and monitor your NAT Gateway data processing metrics closely in the first month.

Real Lab Experiences from Production

Let me share some lessons that cost me sleep so they won’t cost you yours.

NAT Gateway costs sneak up on teams. I consulted for a startup that was processing large datasets through private EC2 instances. Their NAT Gateway data processing charges hit $3,000 in a single month. If you’re transferring significant data, consider VPC endpoints for AWS services—they’re cheaper and faster.

The “SSM agent not connecting” issue is almost always one of three things: missing IAM role, no outbound internet path (NAT Gateway or VPC endpoint), or IMDSv2 hop limit set to 1 on containerized workloads. I’ve probably troubleshot this a hundred times.

In regulated environments, Session Manager is a game-changer. I worked with a healthcare company where auditors specifically asked for proof that SSH was disabled across all instances. Session Manager gave them command-level logging in CloudWatch, user attribution through IAM, and zero network attack surface. The auditors were thrilled.

Validation and Testing

Verify SSM connectivity from your local machine using AWS CLI:

aws ssm describe-instance-information --query "InstanceInformationList[*].[InstanceId,PingStatus]"

Your instance should show “Online” status.

Confirm the IAM role is attached:

aws ec2 describe-instances --instance-ids i-xxxxx --query "Reservations[*].Instances[*].IamInstanceProfile"

Test outbound access from the instance:

curl https://checkip.amazonaws.com

This should return your NAT Gateway’s Elastic IP, proving traffic is routing correctly.

Troubleshooting Guide

SSM Session Manager shows “Start session failed”

Check that your instance has the SSM agent running:

sudo systemctl status amazon-ssm-agent

Verify the IAM role is attached and has AmazonSSMManagedInstanceCore. Check that your private subnet’s route table has a route to the NAT Gateway.

Instance can’t reach the internet

Run ip route inside the instance. You should see a default route via the subnet gateway. If not, your route table association might be wrong.

Check that the NAT Gateway is in “Available” state—it takes a few minutes after creation.

SSM agent logs show authentication errors

Run aws sts get-caller-identity from the instance. If it fails, your instance metadata service might be blocked. Check that IMDSv2 hop limit is at least 2 and that security group allows outbound HTTPS.

“No managed instances found” in SSM console

The SSM agent takes 2-5 minutes to register after instance launch. Wait and refresh. Also verify you’re looking at the correct region—I’ve been fooled by this more times than I’d like to admit.

Session Manager works but curl fails

This usually means SSM is using VPC endpoints (or direct routes to SSM endpoints exist) but your NAT Gateway isn’t configured correctly. Check that the private route table has 0.0.0.0/0 pointing to the NAT Gateway.

Common Interview Questions from This Lab

These questions come up constantly in AWS Solutions Architect and DevOps interviews. If you’ve completed this lab, you can answer all of them from direct experience.

“How does an EC2 instance in a private subnet access the internet?”

The private subnet’s route table contains a route sending 0.0.0.0/0 traffic to a NAT Gateway. The NAT Gateway, located in a public subnet with an Elastic IP, performs network address translation—it replaces the instance’s private IP with its own public IP for outbound traffic and routes responses back. This allows outbound connections while preventing inbound connections from the internet.

“Why doesn’t SSM Session Manager need inbound security group rules?”

Session Manager’s architecture is fundamentally different from SSH. The SSM agent on the instance initiates an outbound HTTPS connection to the SSM service endpoint. Commands and responses flow over this agent-initiated channel. Since the instance initiates the connection (outbound), no inbound rules are needed. This is why Session Manager works even in completely locked-down environments with zero inbound access.

“What makes a subnet public versus private?”

The subnet itself doesn’t have a “public” or “private” property—it’s determined by the associated route table. A public subnet has a route table entry sending internet-bound traffic (0.0.0.0/0) to an Internet Gateway. A private subnet’s route table either has no internet route or routes to a NAT Gateway/NAT instance. The auto-assign public IP setting is secondary; an instance with a public IP in a subnet without an IGW route still cannot reach the internet.

“How would you make this architecture highly available?”

Deploy resources across multiple availability zones. Create public and private subnets in at least two AZs. Place a NAT Gateway in each public subnet and configure each private subnet to route through its local NAT Gateway. This ensures that a single AZ failure doesn’t take down internet connectivity for the entire application. For the EC2 instances themselves, use an Auto Scaling group spanning multiple AZs behind an Application Load Balancer.

“What are the cost considerations for NAT Gateway?”

NAT Gateway charges hourly (approximately $0.045/hour) plus data processing ($0.045/GB). For applications with high outbound traffic, these costs add significantly. Alternatives include NAT instances (cheaper but require management), VPC endpoints for AWS services (free for Gateway endpoints, cheaper than NAT for Interface endpoints), and accepting limited internet access for workloads that don’t truly need it.

AWS Best Practices

From a Solutions Architect perspective, this lab demonstrates several Well-Architected principles.

Security: No SSH means no SSH key management headaches, no port 22 exposure, and no risk of key theft. Combine this with IMDSv2 enforcement and you’ve eliminated two of the most common EC2 attack vectors.

Reliability: Placing the NAT Gateway in a public subnet with an Elastic IP ensures consistent outbound connectivity. For production, deploy NAT Gateways in multiple AZs to avoid single points of failure.

Cost Optimization: NAT Gateways aren’t cheap. For heavy AWS API usage, consider VPC endpoints for services like S3, DynamoDB, and SSM. Gateway endpoints are free, and Interface endpoints often cost less than NAT Gateway data processing for high-volume scenarios.

Operational Excellence: Implement consistent tagging from day one. Tag every resource with Environment, Project, and Owner at minimum. Your future self will thank you when trying to identify orphaned resources.

Performance: For latency-sensitive applications, keep resources that communicate frequently in the same AZ to avoid cross-AZ data transfer charges and latency.

Frequently Asked Questions

Can I connect to a private EC2 instance without a NAT Gateway?

Yes, but with limitations. You can use SSM Session Manager without a NAT Gateway if you create VPC endpoints for SSM services (ssm, ssmmessages, and ec2messages). This approach is actually more secure and can be cheaper for instances that don’t need general internet access—they only need connectivity to the SSM endpoints. However, the instance won’t be able to reach the public internet for tasks like downloading packages from public repositories.

Why is my SSM Session Manager connection timing out?

The most common causes are missing or incorrect IAM role attachment, no network path to SSM endpoints (missing NAT Gateway or VPC endpoints), security group blocking outbound HTTPS (port 443), or the SSM agent not running on the instance. Check each systematically using the troubleshooting commands in this guide. Also verify IMDSv2 hop limit is at least 2 if you’re running containers.

How much does a NAT Gateway cost per month?

A NAT Gateway running 24/7 costs approximately $32-35 per month in hourly charges alone ($0.045 × 730 hours). Data processing adds $0.045 per GB. A typical small application processing 100 GB/month would cost around $37 total. High-bandwidth applications can see bills exceeding $500/month just for NAT Gateway. Always monitor the NATGateway BytesProcessed CloudWatch metric.

Is it possible to use Session Manager with Windows EC2 instances?

Absolutely. The SSM agent is pre-installed on Windows AMIs provided by AWS, just like Amazon Linux. The setup process is identical: attach an IAM role with AmazonSSMManagedInstanceCore, ensure outbound HTTPS connectivity, and connect via the Session Manager console. Windows sessions provide PowerShell access by default instead of bash.

What is the difference between NAT Gateway and NAT instance?

NAT Gateway is a managed AWS service that handles scaling, availability, and patching automatically. NAT instances are EC2 instances you configure yourself to perform NAT. NAT Gateway is more expensive but requires zero maintenance. NAT instances are cheaper but require you to manage patching, monitoring, and failover. For most use cases, NAT Gateway’s operational simplicity outweighs the cost difference. NAT instances make sense primarily for very low-traffic environments or when you need features like port forwarding that NAT Gateway doesn’t support.

Can I restrict which IAM users can connect via Session Manager?

Yes. Session Manager access is controlled through IAM policies. You can create policies that allow or deny the ssm:StartSession action, optionally scoped to specific instances using resource tags or instance IDs. This gives you fine-grained control over who can access which instances—far more granular than SSH key distribution ever allowed.

Do I need a public subnet if I only use VPC endpoints?

Not necessarily. If your architecture uses exclusively VPC endpoints for all AWS service access and has no need for public internet connectivity, you can run a VPC with only private subnets. This pattern is common in highly regulated environments. However, most applications eventually need some public internet access (third-party APIs, package repositories, external integrations), so the public/private subnet pattern remains the standard approach.

Conclusion and Next Steps

You’ve just built something real. A VPC with proper network segmentation, secure routing through a NAT Gateway, and SSH-less access that would make any security auditor smile. This isn’t a toy architecture—Fortune 500 companies run variations of this exact design in production.

The key takeaways: private subnets protect your workloads from direct internet exposure, NAT Gateways provide controlled outbound access, and Session Manager eliminates the security risks of SSH while adding audit capabilities you didn’t know you needed.

In the next lab, Lab 2.2 — Security Groups vs NACLs: Defense in Depth, we’ll add another layer of protection to this architecture. You’ll learn when to use security groups versus network ACLs, how to implement proper ingress and egress controls, and why stateful versus stateless firewalling matters in practice.

Don’t forget to delete your NAT Gateway and release the Elastic IP when you’re done—those charges add up faster than you’d expect.

Related post: VPC Fundamentals

Related post: IAM Roles for EC2

External reference: AWS VPC Documentation

External reference: Session Manager Documentation