Designing a Multi-Tenant VPC Topology for a SaaS Platform
When people say multi-tenant they usually mean a tenant_id column. That is the application layer. The harder and more interesting work happens lower down, in the VPC and the IAM policies, where a mistake is not a bug, it is a data exposure.
Start with the blast radius
Before drawing a single subnet I asked one question. If one tenant's workload is compromised, what is the largest set of things the attacker can now reach? The whole topology is just the answer to that question made concrete.
- Public subnets hold only the load balancer. Nothing with data lives there.
- Application tier sits in private subnets with no route to the internet except through a NAT for outbound patching.
- The data tier is in isolated subnets with no NAT and no internet route at all.
That three-tier split is unglamorous and it is exactly why it works. Each layer can only talk to the one below it through tightly scoped security groups, referenced by group, never by CIDR.
Isolation model
For BuildTracker I used a shared infrastructure, isolated data approach. Tenants share the compute and network plane for cost reasons, but data isolation is enforced through scoped IAM and per-tenant encryption context rather than trusting application code alone.
If isolation depends only on a WHERE clause, you do not have isolation. You have a convention.
hclresource "aws_security_group" "app" { name = "app-tier" vpc_id = aws_vpc.main.id } resource "aws_security_group_rule" "db_from_app" { type = "ingress" from_port = 5432 to_port = 5432 protocol = "tcp" security_group_id = aws_security_group.db.id source_security_group_id = aws_security_group.app.id }
Why this scales
New tenants do not change the topology. They change data and identity scoping, which is exactly where you want change to live. The network is stable, auditable, and described entirely in Terraform, so a reviewer can read the isolation story instead of taking it on faith.